Polynucleotides for use as tags and tag complements, manufacture and use thereof

ABSTRACT

A family of minimally cross-hybridizing nucleotide sequences, methods of use, etc. A specific family of 210 24mers is described.

FIELD OF THE INVENTION

This invention relates to families of oligonucleotide tags for use, forexample, in sorting molecules. Members of a given family of tags can bedistinguished one from the other by specific hybridization to their tagcomplements.

BACKGROUND OF THE INVENTION

Specific hybridization of oligonucleotides and their analogs is afundamental process that is employed in a wide variety of research,medical, and industrial applications, including the identification ofdisease-related polynucleotides in diagnostic assays, screening forclones of novel target polynucleotides, identification of specificpolynucleotides in blots of mixtures of polynucleotides, therapeuticblocking of inappropriately expressed genes and DNA sequencing. Sequencespecific hybridization is critical in the development of high throughputmultiplexed nucleic acid assays. As formats for these assays expand toencompass larger amounts of sequence information acquired throughprojects such as the Human Genome project, the challenge of sequencespecific hybridization with high fidelity is becoming increasinglydifficult to achieve.

In large part, the success of hybridization using oligonucleotidesdepends on minimizing the number of false positives and false negatives.Such problems have made the simultaneous use of multiple hybridizationprobes in a single experiment i.e. multiplexing, particularly in theanalysis of multiple gene sequences on a gene microarray, verydifficult. For example, in certain binding assays, a number of nucleicacid molecules are bound to a chip with the desire that a given “target”sequence will bind selectively to its complement attached to the chip.Approaches have been developed that involve the use of oligonucleotidetags attached to a solid support that can be used to specificallyhybridize to the tag complements that are coupled to probe sequences.Chetverin et al. (WO 93/17126) uses sectioned, binary oligonucleotidearrays to sort and survey nucleic acids. These arrays have a constantnucleotide sequence attached to an adjacent variable nucleotidesequence, both bound to a solid support by a covalent linking moiety.These binary arrays have advantages compared with ordinary arrays inthat they can be used to sort strands according to their terminalsequences so that each strand binds to a fixed location on an array. Thedesign of the terminal sequences in this approach comprises the use ofconstant and variable sequences. U.S. Pat. Nos. 6,103,463 and 6,322,971issued to Chetverin et al. on Aug. 15, 2000 and Nov. 27, 2001,respectively.

This concept of using molecular tags to sort a mixture of molecules isanalogous to molecular tags developed for bacterial and yeast genetics(Hensel et al., Science; 269, 400-403: 1995 and Schoemaker et al.,Nature Genetics; 14, 450-456: 1996). Here, a method termed “signaturetagged” mutagenesis in which each mutant is tagged with a different DNAsequence is used to recover mutant genes from a complex mixture ofapproximately 10,000 bacterial colonies. In the tagging approach ofBarany et al. (WO 9731256), known as the “zip chip”, a family of nucleicacid molecules, the “zip-code addresses”, each different from eachother, are set out on a grid. Target molecules are attached tooligonucleotide sequences complementary to the “zipcode addresses,”referred to as “zipcodes,” which are used to specifically hybridize tothe address locations on the grid. While the selection of these familiesof polynucleotide sequences used as addresses is critical for correctperformance of the assay, the performance has not been described.

Working in a highly parallel hybridization environment requiringspecific hybridization imposes very rigorous selection criteria for thedesign of families of oligonucleotides that are to be used. The successof these approaches is dependent on the specific hybridization of aprobe and its complement. Problems arise as the family of nucleic acidmolecules cross-hybridize or hybridize incorrectly to the targetsequences. While it is common to obtain incorrect hybridizationresulting in false positives or an inability to form hybrids resultingin false negatives, the frequency of such results must be minimized. Inorder to achieve this goal certain thermodynamic properties of formingnucleic acid hybrids must be considered. The temperature at whicholigonucleotides form duplexes with their complementary sequences knownas the T_(m) (the temperature at which 50% of the nucleic acid duplex isdissociated) varies according to a number of sequence dependentproperties including the hydrogen bonding energies of the canonicalpairs A-T and G-C (reflected in GC or base composition), stacking freeenergy and, to a lesser extent, nearest neighbour interactions. Theseenergies vary widely among oligonucleotides that are typically used inhybridization assays. For example, hybridization of two probe sequencescomposed of 24 nucleotides, one with a 40% GC content and the other witha 60% GC content, with its complementary target under standardconditions theoretically may have a 10° C. difference in meltingtemperature (Mueller et al., Current Protocols in Mol. Biol.; 15,5:1993). Problems in hybridization occur when the hybrids are allowed toform under hybridization conditions that include a single hybridizationtemperature that is not optimal for correct hybridization of alloligonucleotide sequences of a set. Mismatch hybridization ofnon-complementary probes can occur forming duplexes with measurablemismatch stability (Santalucia et al., Biochemistry; 38: 3468-77, 1999).Mismatching of duplexes in a particular set of oligonucleotides canoccur under hybridization conditions where the mismatch results in adecrease in duplex stability that results in a higher T_(m) than theleast stable correct duplex of that particular set. For example, ifhybridization is carried out under conditions that favor the AT-richperfect match duplex sequence, the possibility exists for hybridizing aGC-rich duplex sequence that contains a mismatched base having a meltingtemperature that is still above the correctly formed AT-rich duplex.Therefore design of families of oligonucleotide sequences that can beused in multiplexed hybridization reactions must include considerationfor the thermodynamic properties of oligonucleotides and duplexformation that will reduce or eliminate cross hybridization behaviorwithin the designed oligonucleotide set.

A multiplex sequencing method has been described in U.S. Pat. No.4,942,124, which issued to Church on Jul. 17, 1990. The method requiresat least two vectors which differ from each other at a tag sequence. Itis stated in the specification that a tag sequence in one vector willnot hybridize under stringent hybridization conditions to a tag sequencein another vector, i.e. a complementary probe of a tag in one vectordoes not cross-hybridize with a tag sequence in another vector.Exemplary stringent hybridization conditions are given as 42° C. in500-1000 mM sodium phosphate buffer. A set of 42 20-mer tag sequences,all of which lack G residues, is given in FIG. 3 of Church'sspecification. Details of how the sequences were obtained are notprovided, although Church states that initially 92 were chosen on thebasis of their having sufficient sequence diversity to insureuniqueness.

There have been other attempts at the development of families of tags.There are a number of different approaches for selecting sequences foruse in multiplexed hybridization assays. The selection of sequences thatcan be used as zipcodes or tags in an addressable array has beendescribed in the patent literature in an approach taken by Brenner andco-workers. U.S. Pat. No. 5,654,413 describes a population ofoligonucleotide tags (and corresponding tag complements) in which eacholigonucleotide tag includes a plurality of subunits, each subunitconsisting of an oligonucleotide having a length of from three to sixnucleotides and each subunit being selected from a minimally crosshybridizing set, wherein a subunit of the set would have at least twomismatches with any other sequence of the set. Table II of the Brennerpatent specification describes exemplary groups of 4mer subunits thatare minimally cross hybridizing according to the aforementionedcriteria. In the approach taken by Brenner, constructing noncross-hybridizing oligonucleotides, relies on the use of subunits thatform a duplex having at least two mismatches with the complement of anyother subunit of the same set. The ordering of subunits in theconstruction of oligonucleotide tags is not specifically defined.

Parameters used in the design of tags based on subunits are discussed inBarany et al. (WO 9731256). For example, in the design of polynucleotidesequences that are for example 24 nucleotides in length (24mer) derivedfrom a set of four possible tetramers in which each 24mer “address”differs from its nearest 24mer neighbour by 3 tetramers. They discussfurther that, if each tetramer differs from each other by at least twonucleotides, then each 24mer will differ from the next by at least sixnucleotides. This is determined without consideration for insertions ordeletions when forming the alignment between any two sequences of theset. In this way a unique “zip code” sequence is generated. The zip codeis ligated to a label in a target dependent manner, resulting in aunique “zip code” which is then allowed to hybridize to its address onthe chip. To minimize cross-hybridization of a “zip code” to other“addresses”, the hybridization reaction is carried out at temperaturesof 75-80° C. Due to the high temperature conditions for hybridization,24mers that have partial homology hybridize to a lesser extent thansequences with perfect complementarity and represent ‘dead zones’. Thisapproach of implementing stringent hybridization conditions for example,involving high temperature hybridization, is also practiced by Brenneret. al.

The current state of technology for designing non-cross hybridizing tagsbased on subunits does not provide sufficient guidance to construct afamily of sequences with practical value in assays that requirestringent non-cross hybridizing behavior.

Thus, while it is desirable with such arrays to have, at once, a largenumber of address molecules, the address molecules should each be highlyselective for its own complement sequence. While such an array providesthe advantage that the family of molecules making up the grid isentirely of design, and does not rely on sequences as they occur innature, the provision of a family of molecules, which is sufficientlylarge and where each individual member is sufficiently selective for itscomplement over all the other zipcode molecules (i.e., where there issufficiently low cross-hybridization, or cross-talk) continues to eluderesearchers.

SUMMARY OF INVENTION

Using the method of Benight et al. (described in commonly-ownedinternational patent application No. PCT/CA 01/00141 published under WO01/59151 on Aug. 16, 2001) a family of 100 nucleotide sequences wasobtained using a computer algorithm to have optimal hybridizationproperties for use in nucleic acid detection assays. The sequence set of100 oligonucleotides was characterized in hybridization assays,demonstrating the ability of family members to correctly hybridize totheir complementary sequences with an absence of cross hybridization.These are the sequences having SEQ ID NOs:1 to 100 of Table I. This setof sequences has been expanded to include an additional 110 sequencesthat can be grouped with the original 100 sequences as having non-crosshybridizing properties, based on the characteristics of the original setof 100 sequences. These additional sequences are identified as SEQ IDNOs:101 to 210 of the sequences in Table I. How these sequences wereobtained is described below.

Variant families of sequences (seen as tags or tag complements) of afamily of sequences taken from Table I are also part of the invention.For the purposes of discussion, families of tag complements will bedescribed.

A family of complements is obtained from a set of oligonucleotides basedon a family of oligonucleotides such as those of Table I. Forillustrative purposes, providing a family of complements based on theoligonucleotides of Table I will be described.

Firstly, sequences based on the oligonucleotides of Table I can berepresented as follows: TABLE IA Numeric sequences corresponding to wordpatterns of a set of oligonucleotides Sequence Identifier NumericPattern 1 1 4 6 6 1 3 2 2 4 5 5 2 3 3 1 8 1 2 3 4 4 1 7 1 9 8 4 5 1 1 92 6 9 6 1 2 4 3 9 6 7 9 8 9 8 10 9 8 9 1 2 3 8 10 9 8 8 7 4 3 1 10 1 1 11 1 2 11 2 1 3 3 2 2 12 3 1 2 2 3 2 13 4 1 4 4 4 2 14 1 2 3 3 1 1 15 1 32 2 1 4 16 3 3 3 3 3 4 17 4 3 1 1 4 4 18 3 4 1 1 3 3 19 3 6 6 6 3 5 20 66 1 1 6 5 21 7 6 7 7 7 5 22 8 7 5 5 8 8 23 2 1 7 7 1 1 24 2 3 2 3 1 3 252 6 5 6 1 6 26 4 8 1 1 3 8 27 5 3 1 1 6 3 28 5 6 8 8 6 6 29 8 3 6 5 7 330 1 2 3 1 4 6 31 1 5 7 5 4 3 32 2 1 6 7 3 6 33 2 6 1 3 3 1 34 2 7 6 8 31 35 3 4 3 1 2 5 36 3 5 6 1 2 7 37 3 6 1 7 2 7 38 4 6 3 5 1 7 39 5 4 6 38 6 40 6 8 2 3 7 1 41 7 1 7 8 6 3 42 7 3 4 1 6 8 43 4 7 7 1 2 4 44 3 6 52 6 3 45 1 4 1 4 6 1 46 3 3 1 4 8 1 47 8 3 3 5 3 8 48 1 3 6 6 3 7 49 7 38 6 4 7 50 3 1 3 7 8 6 51 10 9 5 5 10 10 52 7 10 10 10 7 9 53 9 9 7 7 109 54 9 3 10 3 10 3 55 9 6 3 4 10 6 56 10 4 10 3 9 4 57 3 9 3 10 4 9 58 910 5 9 4 8 59 3 9 4 9 10 7 60 3 5 9 4 10 8 61 4 10 5 4 9 3 62 5 3 3 9 810 63 6 8 6 9 7 10 64 4 6 10 9 6 4 65 4 9 8 10 8 3 66 7 7 9 10 5 3 67 88 9 3 9 10 68 8 10 2 9 5 9 69 9 6 2 2 7 10 70 9 7 5 3 10 6 71 10 3 6 8 92 72 10 9 3 2 7 3 73 8 9 10 3 6 2 74 3 2 5 10 8 9 75 8 2 3 10 2 9 76 6 39 8 2 10 77 3 7 3 9 9 10 78 9 10 1 1 9 4 79 10 1 9 1 4 1 80 7 1 10 9 8 181 9 1 10 1 10 6 82 9 6 9 1 3 10 83 3 10 8 8 9 1 84 3 8 1 9 10 3 85 9 101 3 6 9 86 1 9 1 10 3 1 87 1 4 9 6 8 10 88 3 3 9 6 1 10 89 5 3 1 6 9 1090 6 1 8 10 9 6 91 5 9 9 4 10 3 92 2 10 9 1 9 5 93 10 10 7 2 1 9 94 10 99 1 8 2 95 1 8 6 8 9 10 96 1 9 1 3 8 10 97 9 6 9 10 1 2 98 1 10 8 9 9 299 1 9 6 7 2 9 100 4 3 9 3 5 1 101 5 11 10 14 12 1 102 7 12 4 13 3 2 1035 5 4 4 12 9 104 2 13 13 11 13 13 105 10 2 5 4 12 7 106 11 7 4 11 6 4107 12 12 1 9 11 11 108 12 9 4 14 12 6 109 12 7 13 2 9 11 110 9 11 3 4 13 111 10 5 12 11 4 4 112 4 13 7 12 1 5 113 9 13 10 11 11 6 114 10 14 1410 1 3 115 2 14 1 10 4 5 116 10 12 12 7 11 10 117 9 11 2 12 8 11 118 2 85 2 12 14 119 1 8 13 3 7 8 120 9 4 7 5 4 2 121 13 2 12 7 1 12 122 11 109 7 5 11 123 8 12 2 2 12 7 124 5 2 14 3 4 13 125 1 8 8 1 5 9 126 14 5 1110 13 3 127 14 1 4 13 2 4 128 4 4 5 11 3 10 129 10 9 2 3 3 11 130 11 4 814 3 4 131 5 1 14 8 11 2 132 14 3 11 6 12 5 133 13 4 4 1 10 1 134 6 1011 6 5 1 135 5 8 12 5 1 7 136 4 5 9 6 9 2 137 13 2 4 4 2 3 138 11 2 2 59 3 139 8 1 10 12 2 8 140 12 7 9 11 4 1 141 12 1 4 14 3 13 142 11 2 7 104 1 143 3 4 12 11 11 11 144 3 3 4 2 12 11 145 1 5 9 4 2 1 146 6 1 12 210 5 147 10 5 1 12 2 14 148 2 11 7 9 4 11 149 7 4 4 5 14 12 150 12 5 2 110 12 151 5 9 2 11 6 1 152 12 14 3 6 1 14 153 5 9 11 10 1 4 154 2 5 1214 10 10 155 4 5 8 4 5 6 156 10 12 4 6 12 5 157 4 2 1 13 6 8 158 9 10 1014 5 3 159 6 14 10 11 3 3 160 2 9 10 12 5 7 161 13 3 7 10 5 12 162 6 4 12 5 13 163 6 1 13 4 14 13 164 2 12 1 14 1 9 165 4 11 13 2 6 10 166 1 107 4 5 8 167 7 2 2 10 13 4 168 8 2 11 4 6 14 169 4 8 2 6 2 3 170 7 1 1211 2 9 171 5 6 10 4 13 4 172 5 10 4 11 9 3 173 3 11 9 3 2 3 174 8 15 620 17 19 175 21 10 15 3 7 11 176 11 7 17 20 14 9 177 16 6 17 13 21 21178 10 15 22 6 17 21 179 15 7 17 10 22 22 180 3 20 8 15 20 16 181 17 2110 16 6 22 182 6 21 14 14 14 16 183 7 17 3 20 10 7 184 16 19 14 17 7 21185 20 16 7 15 22 10 186 20 10 18 11 22 18 187 18 7 19 15 7 22 188 21 187 21 16 3 189 14 13 7 22 17 13 190 19 7 8 12 10 17 191 15 3 21 14 9 7192 19 6 15 7 14 14 193 4 17 10 15 20 19 194 21 6 18 4 20 16 195 2 19 817 6 13 196 12 12 6 17 4 20 197 16 21 12 10 19 16 198 14 14 15 2 7 21199 8 16 21 6 22 16 200 14 17 22 14 17 20 201 10 21 7 15 21 18 202 16 1320 18 21 12 203 15 7 4 22 14 13 204 7 19 14 8 15 4 205 4 5 3 20 7 16 20622 18 6 18 13 20 207 19 6 16 3 13 3 208 18 6 22 7 20 18 209 10 17 11 218 13 210 7 10 17 19 10 14

Here, each of the numerals 1 to 22 (numeric identifiers) represent a4mer and the pattern of numerals 1 to 22 of the sequences in the abovelist corresponds to the pattern of tetrameric oligonucleotide segmentspresent in the oligonucleotides of Table I, which oligonucleotides havebeen found to be non-cross-hybridizing, as described further in thedetailed examples. Each 4mer is selected from the group of 4mersconsisting of WWWW, WWWX, WWWY, WWXW, WWXX, WWXY, WWYW, WWYX, WWYY,WXWW, WXWX, WXWY, WXXW, WXXX, WXXY, WXYW, WXYX, WXYY, WYWW, WYWX, WYWY,WYXW, WYXX, WYXY, WYYW, WYYX, WYYY, XWWW, XWWX, XWWY, XWXW, XWXX, XWXY,XWYW, XWYX, XWYY, XXWW, XXWX, XXWY, XXXW, XXXX, XXXY, XXYW, XXYX, XXYY,XYWW, XYWX, XYWY, XYXW, XYXX, XYXY, XYYW, XYYX, XYYY, YWWW, YWWX, YWWY,YWXW, YWXX, YWXY, YWYW, YWYX, YWYY, YXWW, YXWX, YXWY, YXXW, YXXX, YXXY,YXYW, YXYX, YXYY, YYWW, YYWX, YYWY, YYXW, YYXX, YYXY, YYYW, YYYX, andYYYY. Here W, X and Y represent nucleotide bases, A, G, C, etc., theassignment of bases being made according to rules described below.

Given this numeric pattern, a 4mer is assigned to a numeral. Forexample, 1=WXYY, 2=YWXY, etc. Once a given 4mer has been assigned to agiven numeral, it is not assigned for use in the position of a differentnumeral. It is possible, however, to assign a different 4mer to the samenumeral. That is, for example, the numeral 1 in one position could beassigned WXYY and another numeral 1, in a different position, could beassigned XXXW, but none of the other numerals 2 to 10 can then beassigned WXYY or XXXW. A different way of saying this is that each of 1to 22 is assigned a 4mer from the list of eighty-one 4mers indicated soas to be different from all of the others of 1 to 22.

In the case of the specific oligonucleotides given in Table I, 1=WXYY,2=YWXY, 3=XXXW, 4=YWYX, 5=WYXY, 6=YYWX, 7=YWXX, 8=WYXX, 9=XYYW, 10=XYWX,11=YYXW, 12=WYYX, 13=XYXW, 14=WYYY, 15=WXYW, 16=WYXW, 17=WXXW, 18=WYYW,19=XYYX, 20=YXYX, 21=YXXY and 22=XYXY.

Once the 4mers are assigned to positions according to the above pattern,a particular set of oligonucleotides can be created by appropriateassignment of bases, A, T/U, G, C to W, X, Y. These assignments are madeaccording to one of the following two sets of rules:

-   (i) Each of W, X and Y is a base in which:    -   (a) W=one of A, T/U, G, and C,        -   X=one of A, T/U, G, and C,        -   Y=one of A, T/U, G, and C,        -   and each of W, X and Y is selected so as to be different            from all of the others of W, X and Y, and    -   (b) an unselected said base of (i)(a) can be substituted any        number of times for any one of W, X and Y. or-   (ii) Each of W, X and Y is a base in which:    -   (a) W=G or C,        -   X=A or T/U,        -   Y=A or T/U,        -   and X≠Y, and    -   (b) a base not selected in (ii)(a) can be inserted into each        sequence at one or more locations, the location of each        insertion being the same in each sequence as that of every other        sequence of the set.

In the case of the specific oligonucleotides given in Table I, W=G, X=Aand Y=T.

In any case, given a set of oligonucleotides generated according to oneof these sets of rules, it is possible to modify the members of a givenset in relatively minor ways and thereby obtain a different set ofsequences while more or less maintaining the cross-hybridizationproperties of the set subject to such modification. In particular, it ispossible to insert up to 3 of A, T/U, G and C at any location of anysequence of the set of sequences. Alternatively, or additionally, up to3 bases can be deleted from any sequence of the set of sequences.

A person skilled in the art would understand that given a set ofoligonucleotides having a set of properties making it suitable for useas a family of tags (or tag complements) one can obtain another familywith the same property by reversing the order of all of the members ofthe set. In other words, all the members can be taken to be read 5′ to3′ or to be read 3′ to 5′.

A family of complements of the present invention is based on a given setof oligonucleotides defined as described above. Each complement of thefamily is based on a different oligonucleotide of the set and eachcomplement contains at least 10 consecutive (i.e., contiguous) bases ofthe oligonucleotide on which it is based. When selecting a sequence ofcontiguous bases, preference is given to those sets in which thecontiguous bases of each oligonucleotide of a set are selected such thatthe position of the first base of each said oligonucleotide within thesequence on which it is based is the same for all nucleotides of theset. Thus, for example, if a nucleotide sequence of twenty contiguousbases corresponds to bases 3 to 22 of the sequence on which thenucleotide sequence is based, then preferably, the twenty contiguousbases for all nucleotide sequences corresponds to bases 3 to 22 of thesequences on which the nucleotides sequences are based. For a givenfamily of complements where one is seeking to reduce or minimizeinter-sequence similarity that would result in cross-hybridization, eachand every pair of complements meets particular homology requirements.Particularly, subject to limited exceptions, described below, any twocomplements within a set of complements are generally required to have adefined amount of dissimilarity.

In order to notionally understand these requirements for dissimilarityas they exist for a given pair of complements of a family, a phantomsequence is generated from the pair of complements. A “phantom” sequenceis a single sequence that is generated from a pair of complements byselection, from each complement of the pair, of a string of baseswherein the bases of the string occur in the same order in bothcomplements. An object of creating such a phantom sequence is to createa convenient and objective means of comparing the sequence identity ofthe two parent sequences from which the phantom sequence is created.

A phantom sequence can be considered to be similar in concept to aconsensus sequence which a person skilled in the art would be familiarwith, except that a consensus sequence typically is comprised of allbases from both parent sequences with each position reflecting the mostcommon choice of base at each position (the union of both sequences),whereas the “phantom” sequence is comprised of only bases which occur inthe same order in both parent sequences (the intersection of bothsequences). Also, a consensus sequence usually is indicative of a commonphylogenetic ancestry for the two sequences (or more than 2 sequencesdepending on how many sequences are used to generate the consensussequence), whereas the “phantom” sequence definition has been created tospecifically address the sequence similarity between 2 complementarysequences which have no ancestral history but may have a propensity tocross-hybridize under certain conditions.

A phantom sequence may thus be generated from exemplary Sequence 1 andSequence 2 as follows: Sequence 1: ATGTTTAGTGAAAAGTTAGTATTG   *        • Sequence 2: ATGTTAGTGAATAGTATAGTATTG            •   ♦Phantom Sequence: ATGTTAGTGAAAGTTAGTATTG

The phantom sequence generated from these two sequences is thus 22 basesin length. That is, one can see that there are 22 identical bases withidentical sequence (the same order) in Sequence Nos. 1 and 2. There is atotal of three insertions/deletions and mismatches present in thephantom sequence when compared with the sequences from which it wasgenerated: ATGT-TAGTGAA-AGT-TAGTATTGThe dashed lines in this latter representation of the phantom sequenceindicate the locations of the insertions/deletions and mismatches in thephantom sequence relative to the parent sequences from which it wasderived. Thus, the “T” marked with an asterisk in Sequence 1, the “A”marked with a diamond in Sequence 2 and the “A-T” mismatch of Sequences1 and 2 marked with two dots were deleted in generating the phantomsequence.

A person skilled in the art will appreciate that the term“insertion/deletion” is intended to cover the situations indicated bythe asterisk and diamond. Whether the change is considered, strictlyspeaking, an insertion or deletion is merely one of vantage point. Thatis, one can see that the fourth base of Sequence 1 can be deletedtherefrom to obtain the phantom sequence, or a “T” can be inserted afterthe third base of the phantom sequence to obtain Sequence 1.

One can thus see that if it were possible to create a phantom sequenceby elimination of a single insertion/deletion from one of the parentsequences, that the two parent sequences would have identical homologyover the length of the phantom sequence except for the presence of asingle base in one of the two sequences being compared. Likewise, onecan see that if it were possible to create a phantom sequence throughdeletion of a mismatched pair of bases, one base in each parent, thatthe two parent sequences would have identical homology over the lengthof the phantom sequence except for the presence of a single base in eachof the sequences being compared. For this reason, the effect of aninsertion/deletion is considered equivalent to the effect of amismatched pair of bases when comparing the homology of two sequences.

Once a phantom sequence is generated, the compatibility of the pair ofcomplements from which it was generated within a family of complementscan be systematically evaluated.

According to one embodiment of the invention, a pair of complements iscompatible for inclusion within a family of complements if any phantomsequence generated from the pair of complements has the followingproperties:

-   -   (1) Any consecutive sequence of bases in the phantom sequence        which is identical to a consecutive sequence of bases in each of        the first and second complements from which it is generated is        no more than ((¾×L)−1) bases in length;    -   (2) The phantom sequence, if greater than or equal to (⅚×L) in        length, contains at least 3 insertions/deletions or mismatches        when compared to the first and second complements from which it        is generated; and    -   (3) The phantom sequence is not greater than or equal to        ({fraction (11/12)}×L) in length.

Here, L₁ is the length of the first complement, L₂ is the length of thesecond complement, and L=L₁, or if L₁≠L₂, L is the greater of L₁ and L₂.

In particular preferred embodiments of the invention, all pairs ofcomplements of a given set have the properties set out above. Underparticular circumstances, it may be advantageous to have a limitednumber of complements that do not meet all of these-requirements whencompared to every other complement in a family.

In one case, for any first complement there are at most two secondcomplements in the family which do not meet all of the three listedrequirements. For two such complements, there would thus be a greaterchance of cross-hybridization between their tag counterparts and thefirst complement. In another case, for any first complement there is atmost one second complement which does not meet all of three listedrequirements.

It is also possible, given this invention, to design a family ofcomplements where a specific number or specific portion of thecomplements do not meet the three listed requirements. For example, aset could be designed where only one pair of complements within the setdo not meet the requirements when compared to each other. There could betwo pairs, three pairs, and any number of pairs up to and including allpossible pairs. Alternatively, it may be advantageous to have a givenproportion of pairs of complements that do not meet the requirements,say 10% of pairs, when compared with other sequences that do not meetone or more of the three requirements listed. This number could insteadbe 5%, 15%, 20%, 25%, 30%, 35%, or 40%.

The foregoing comparisons would generally be largely carried out usingappropriate computer software. Although notionally described in terms ofa phantom sequence for the sake of clarity and understanding, it will beunderstood that a competent computer programmer can carry out pairwisecomparisons of complements in any number of ways using logical stepsthat obtain equivalent results.

The symbols A, G, T/U, C take on their usual meaning in the art here. Inthe case of T and U, a person skilled in the art would understand thatthese are equivalent to each other with respect to the inter-strandhydrogen-bond (Watson-Crick) binding properties at work in the contextof this invention. The two bases are thus interchangeable and hence thedesignation of T/U.

Analogues of the naturally occurring bases can be inserted in theirrespective places where desired. An Analogue is any non-natural base,such as peptide nucleic acids and the like that undergoes normalWatson-Crick pairing in the same way as the naturally occurringnucleotide base to which it corresponds.

In one broad aspect, the present invention is thus a compositioncomprising molecules for use as tags or tag complements wherein eachmolecule comprises an oligonucleotide selected from a set ofoligonucleotides based on a group of sequences having numeric patters asset out in Table IA wherein:

-   -   (A) each of 1 to 22 is a 4mer selected from the group of 4mers        consisting of WWWW, WWWX, WWWY, WWXW, WWXX, WWXY, WWYW, WWYX,        WWYY, WXWW, WXWX, WXWY, WXXW, WXXX, WXXY, WXYW, WXYX, WXYY,        WYWW, WYWX, WYWY, WYXW, WYXX, WYXY, WYYW, WYYX, WYYY, XWWW,        XWWX, XWWY, XWXW, XWXX, XWXY, XWYW, XWYX, XWYY, XXWW, XXWX,        XXWY, XXXW, XXXX, XXXY, XXYW, XXYX, XXYY, XYWW, XYWX, XYWY,        XYXW, XYXX, XYXY, XYYW, XYYX, XYYY, YWWW, YWWX, YWWY, YWXW,        YWXX, YWXY, YWYW, YWYX, YWYY, YXWW, YXWX, YXWY, YXXW, YXXX,        YXXY, YXYW, YXYX, YXYY, YYWW, YYWX, YYWY, YYXW, YYXX, YYXY,        YYYW, YYYX, and YYYY, and    -   (B) each of 1 to 22 is selected so as to be different from all        of the others of 1 to 22;    -   (C) each of W, X and Y is a base in which either (i) or (ii) is        true:        -   (i) (a) W=one of A, T/U, G, and C,            -   X=one of A, T/U, G, and C,            -   Y=one of A, T/U, G, and C,            -   and each of W, X and Y is selected so as to be different                from all of the others of W, X and Y, and        -   (b) an unselected said base of (i)(a) can be substituted any            number of times for any one of W, X and Y,        -   (ii) (a) W=G or C,            -   X=A or T/U,            -   Y=A or T/U,            -   and X≠Y, and        -   (b) a base not selected in (ii)(a) can be inserted into each            sequence at one or more locations, the location of each            insertion being the same in all the sequences;    -   (D) up to three bases can be inserted at any location of any of        the sequences or up to three bases can be deleted from any of        the sequences;    -   (E) all of the sequences of a said group of oligonucleotides are        read 5′ to 3′ or are read 3′ to 5′; and        wherein each oligonucleotide of a said set has a sequence of at        least ten contiguous bases of the sequence on which it is based,        provided that:    -   (F) (I) the quotient of the sum of G and C divided by the sum of        A, T/U, G and C for all combined sequences of the set is between        about 0.1 and 0.40 and said quotient for each sequence of the        set does not vary from the quotient for the combined sequences        by more than 0.2; and        -   (II) for any phantom sequence generated from any pair of            first and second sequences of the set L₁ and L₂ in length,            respectively, by selection from the first and second            sequences of identical bases in identical sequence with each            other:            -   (i) any consecutive sequence of bases in the phantom                sequence which is identical to a consecutive sequence of                bases in each of the first and second sequence from                which it is generated is less than ((¾×L)−1) bases in                length;            -   (ii) the phantom sequence, if greater than or equal to                (⅚×L) in length, contains at least three                insertions/deletions or mismatches when compared to the                first and second sequences from which it is generated;                and            -   (iii) the phantom sequence is not greater than or equal                to ({fraction (11/12)}×L) in length;            -   where L=L₁, or if L₁≠L₂, where L is the greater of L₁                and L₂; and                wherein any base present may be substituted by an                analogue thereof.

In a preferred embodiment, a set of oligonucleotides of the invention isbased on the numeric patters of sequences tested in Example 2.

Preferably,

-   -   (G) for the group of 24mer sequences in which each 1=GATT, each        2=TGAT, each 3=AAAG, each 4=TGTA, each 5=GTAT, each 6=TTGA, each        7=TGAA, each 8=GTAA, each 9=ATTG, each 10=ATGA, each 11=TTAG,        each 12=GTTA, each 13=ATAG, each 14=GTTT, each 15=GATG, each        16=GTAG, each 17=GAAG, each 18=GTTG, each 19=ATTA, each 20=TATA,        each 21=TAAT and each 22=ATAT, for the group of sequences in        which each 1=GATT, each 2=TGAT, each 3=AAAG, each 4 TGTA, each        5=GTAT, each 6=TTGA, each 7=TGAA, each 8=GTAA, each 9=ATTG, each        10=ATGA, each 11=TTAG, each 12=GTTA, each 13=ATAG, each 14=GTTT,        each 15=GATG, each 16=GTAG, each 17=GAAG, each 18=GTTG, each        19=ATTA, each 20=TATA, each 21=TAAT and each 22=ATAT, under a        defined set of conditions in which the maximum degree of        hybridization between a sequence and any complement of a        different sequence of the group of 24mer sequences does not        exceed 30% of the degree of hybridization between said sequence        and its complement, for all oligonucleotides of the set, the        maximum degree of hybridization between an oligonucleotide and a        complement of any other oligonucleotide of the set does not        exceed 50% of the degree of hybridization of the oligonucleotide        and its complement.

It can thus be seen that it is possible to routinely determine whetherall oligonucleotides of a selected set are all minimallycross-hybridizing. Preferably in (G), under said defined set ofconditions in which the maximum degree of hybridization between asequence and any complement of a different sequence does not exceed 30%of the degree of hybridization between said sequence and its complement,it is also true that the degree of hybridization between each sequenceand its complement varies by a factor of between 1 and 10, morepreferably between 1 and 9, and more preferably between 1 and 8. It isdemonstrated in Example 2, below, for a preferred set ofoligonucleotides, that the degree of hybridization between each sequenceand its specific complement varies by a factor of between 1 and 8.25 andthe maximum degree of hybridization between a sequence and anycomplement of a different sequence does not exceed 10.2% of the degreeof hybridization between the sequence and its specific complement.

Preferably, the maximum degree of hybridization in (G) between asequence and any complement of a different sequence of the group of24mer sequences does not exceed 25%, more preferably wherein the maximumdegree of hybridization in (G) between a sequence and any complement ofa different sequence of the group of 24mer sequences does not exceed20%, more preferably wherein the maximum degree of hybridization in (G)between a sequence and any complement of a different sequence of thegroup of 24mer sequences does not exceed 15%, more preferably whereinthe maximum degree of hybridization in (G) between a sequence and anycomplement of a different sequence of the group of 24mer sequences doesnot exceed 11%.

Preferably, under the defined set of conditions of (G), the maximumdegree of hybridization between a sequence and a complement of any othersequence of the set is no more than 15% greater than the maximum degreeof hybridization between a sequence and any complement of a differentsequence of the said group of 24mer sequences, more preferably no morethan 10% greater, more preferably no more than 5% greater.

According to Example 2, described below, under conditions of 0.2 M NaCl,0.1 M Tris, 0.08% Triton X-100, pH 8.0 at 37° C., the maximum degree ofhybridization between a sequence and any complement of a differentsequence of the group of 24mer sequences does not exceed 10.2% when24mer nucleotide sequences are covalently linked to a solid support, inthis case microparticles or beads.

In another preferred aspect of the composition, in (G) for the group of24mers the maximum degree of hybridization between a sequence and anycomplement of a different sequence does not exceed 15% of the degree ofhybridization between said sequence and its complement and the degree ofhybridization between each sequence and its complement varies by afactor of between 1 and 9, and for all oligonucleotides of the set, themaximum degree of hybridization between an oligonucleotide and acomplement of any other oligonucleotide of the set does not exceed 20%of the degree of hybridization of the oligonucleotide and itscomplement.

In a preferred aspect, each of the 4mers represented by numerals 1 to 22is selected from the group of 4mers consisting of WXXX, WXXY, WXYX,WXYY, WYXX, WYXY, WYYX, WYYY, XWXX, XWXY, XWYX, XWYY, XXWX, XXWY, XXXW,XXYW, XYWX, XYWY, XYXW, XYYW, YWXX, YWXY, YWYX, YWYY, YXWX, YXWY, YXXW,YXYW, YYWX, YYWY, YYXW, and YYYW.

In another aspect, each of the 4mers represented by numeral 1 areidentical to each other, each of the 4mers represented by numeral 2 areidentical to each other, each of the 4mers represented by numeral 3 areidentical to each other, each of the 4mers represented by numeral 4 areidentical to each other, each of the 4mers represented by numeral 5 areidentical to each other, each of the 4mers represented by numeral 6 areidentical to each other, each of the 4mers represented by numeral 7 areidentical to each other, each of the 4mers represented by numeral 8 areidentical to each other, each of the 4mers represented by numeral 9 areidentical to each other, each of the 4mers represented by numeral 10 areidentical to each other, each of the 4mers represented by numeral 11 areidentical to each other, each of the 4mers represented by numeral 12 areidentical to each other, each of the 4mers represented by numeral 13 areidentical to each other, each of the 4mers represented by numeral 14 areidentical to each other, each of the 4mers represented by numeral 15 areidentical to each other, each of the 4mers represented by numeral 16 areidentical to each other, each of the 4mers represented by numeral 17 areidentical to each other, each of the 4mers represented by numeral 18 areidentical to each other, each of the 4mers represented by numeral 19 areidentical to each other, each of the 4mers represented by numeral 20 areidentical to each other, each of the 4mers represented by numeral 21 areidentical to each other, and each of the 4mers represented by numeral 22are identical to each other.

In another aspect, at least one of the 4mers represented by the numeral1 has the sequence WXYY, at least one of the 4mers represented by thenumeral 2 has the sequence YWXY, at least one of the 4mers representedby the numeral 3 has the sequence XXXW, at least one of the 4mersrepresented by the numeral 4 has the sequence YWYX, at least one of the4mers represented by the numeral 5 has the sequence WYXY, at least oneof the 4mers represented by the numeral 6 has the sequence YYWX, atleast one of the 4mers represented by the numeral 7 has the sequenceYWXX, at least one of the 4mers represented by the numeral 8 has thesequence WYXX, at least one of the 4mers represented by the numeral 9has the sequence XYYW, at least one of the 4mers represented by thenumeral 10 has the sequence XYWX, at least one of the 4mers representedby the numeral 11 has the sequence YYXW, at least one of the 4mersrepresented by the numeral 12 has the sequence WYYX, at least one of the4mers represented by the numeral 13 has the sequence XYXW, at least oneof the 4mers represented by the numeral 14 has the sequence WYYY, atleast one of the 4mers represented by the numeral 15 has the sequenceWXYW, at least one of the 4mers represented by the numeral 16 has thesequence WYXW, at least one of the 4mers represented by the numeral 17has the sequence WXXW, at least one of the 4mers represented by thenumeral 18 has the sequence WYYW, at least one of the 4mers representedby the numeral 19 has the sequence XYYX, at least one of the 4mersrepresented by the numeral 20 has the sequence YXYX, at least one of the4mers represented by the numeral 21 has the sequence YXXY, and/or atleast one of the 4mers represented by the numeral 22 has the sequenceXYXY.

In one preferred aspect, the invention is a composition in which each1=WXYY, each 2=YWXY, each 3=XXXW, each 4=YWYX, each 5=WYXY, each 6=YYWX,each 7=YWXX, each 8=WYXX, each 9=XYYW, each 10=XYWX, each 11=YYXW, each12=WYYX, each 13=XYXW, each 14=WYYY, each 15=WXYW, each 16=WYXW, each17=WXXW, each 18=WYYW, each 19=XYYX, each 20=YXYX, each 21=YXXY and each22=XYXY.

In one broad aspect, the invention is a composition wherein a group ofsequences is based on those having numeric patterns of those withnumeric identifiers 1 to 173 of Table IA and wherein each of the 4mersrepresented by numerals 1 to 14 in (A) is selected from the group of4mers consisting of WXYY, YWXY, XXXW, YWYX, WYXY, YYWX, YWXX, WYXX,XYYW, XYWX, YYXW, WYYX, XYXW, and WYYY.

In such a composition it is preferred that each of the 4mers representedby numeral 1 are identical to each other, each of the 4mers representedby numeral 2 are identical to each other, each of the 4mers representedby numeral 3 are identical to each other, each of the 4mers representedby numeral 4 are identical to each other, each of the 4mers representedby numeral 5 are identical to each other, each of the 4mers representedby numeral 6 are identical to each other, each of the 4mers representedby numeral 7 are identical to each other, each of the 4mers representedby numeral 8 are identical to each other, each of the 4mers representedby numeral 9 are identical to each other, each of the 4mers representedby numeral 10 are identical to each other, each of the 4mers representedby numeral 11 are identical to each other, each of the 4mers representedby numeral 12 are identical to each other, each of the 4mers representedby numeral 13 are identical to each other, and/or each of the 4mersrepresented by numeral 14 are identical to each other.

It is also preferred that at least one of the 4mers represented by thenumeral 1 has the sequence WXYY, at least one of the 4mers representedby the numeral 2 has the sequence YWXY, at least one of the 4mersrepresented by the numeral 3 has the sequence XXXW, at least one of the4mers represented by the numeral 4 has the sequence YWYX, at least oneof the 4mers represented by the numeral 5 has the sequence WYXY, atleast one of the 4mers represented by the numeral 6 has the sequenceYYWX, at least one of the 4mers represented by the numeral 7 has thesequence YWXX, at least one of the 4mers represented by the numeral 8has the sequence WYXX, at least one of the 4mers represented by thenumeral 9 has the sequence XYYW, at least one of the 4mers representedby the numeral 10 has the sequence XYWX, at least one of the 4mersrepresented by the numeral 11 has the sequence YYXW, at least one of the4mers represented by the numeral 12 has the sequence WYYX, at least oneof the 4mers represented by the numeral 13 has the sequence XYXW, and/orat least one of the 4mers represented by the numeral 14 has the sequenceWYYY.

More preferably, each 1=WXYY, each 2=YWXY, each 3=XXXW, each 4=YWYX,each 5=WYXY, each 6=YYWX, each 7=YWXX, each 8=WYXX, each 9=XYYW, each10=XYWX, each 11=YYXW, each 12=WYYX, each 13=XYXW, and each 14=WYYY.

In another broad aspect, the invention is a composition in which a groupof sequences is based on those sequences having the numeric patters ofthose with sequence identifiers 1 to 100 set out in Table IA and whereineach of the 4mers represented by numerals 1 to 10 in (A) is selectedfrom the group of 4mers consisting of WXYY, YWXY, XXXW, YWYX, WYXY,YYWX, YWXX, WYXX, XYYW, and XYWX.

In such a composition it is preferred that each of the 4mers representedby numeral 1 are identical to each other, each of the 4mers representedby numeral 2 are identical to each other, each of the 4mers representedby numeral 3 are identical to each other, each of the 4mers representedby numeral 4 are identical to each other, each of the 4mers representedby numeral 5 are identical to each other, each of the 4mers representedby numeral 6 are identical to each other, each of the 4mers representedby numeral 7 are identical to each other, each of the 4mers representedby numeral 8 are identical to each other, each of the 4mers representedby numeral 9 are identical to each other, and/or each of the 4mersrepresented by numeral 10 are identical to each other.

It also preferred that at least one of the 4mers represented by thenumeral 1 has the sequence WXYY, at least one of the 4mers representedby the numeral 2 has the sequence YWXY, at least one of the 4mersrepresented by the numeral 3 has the sequence XXXW, at least one of the4mers represented by the numeral 4 has the sequence YWYX, at least oneof the 4mers represented by the numeral 5 has the sequence WYXY, atleast one of the 4mers represented by the numeral 6 has the sequenceYYWX, at least one of the 4mers represented by the numeral 7 has thesequence YWXX, at least one of the 4mers represented by the numeral 8has the sequence WYXX, at least one of the 4mers represented by thenumeral 9 has the sequence XYYW, and/or at least one of the 4mersrepresented by the numeral 10 has the sequence XYWX.

More preferably, each 1=WXYY, each 2=YWXY, each 3=XXXW, each 4=YWYX,each 5=WYXY, each 6=YYWX, each 7=YWXX, each 8=WYXX, each 9=XYYW, andeach 10=XYWX.

In the most preferred compositions, in (C)(i)(a): W=one of G and C;X=one of A and T/U; and Y=one of A and T/U, maintaining the provisos of(F). More preferably, (C)(i)(a): W=G; X=one of A, and T/U; and Y=one ofA and T/U. Even more preferably, wherein W=G; X=A; and Y=T/U.

A person skilled in the art will appreciate that the closer a givenoligonucleotide sequence variant is to one of the most preferredsequences (Table I), the more closely it will resemble the preferredsequence as a member of a minimally cross-hybridizing set ofoligonucleotides.

It will be understood that when it is stated herein that a group ofsequences (oligonucleotides) is minimally cross-hybridizing, it is meantthat any given member of the group of sequences (oligonucleotides) onlyminimally hybridizes with the complement of any other sequence(oligonucleotide) of that group.

Preferably, in (F)(I), the quotient for each sequence of the set doesnot vary from the quotient for the combined sequences by more than 0.1,more preferably, the quotient for each sequence of the set does not varyfrom the quotient for the combined sequences by more than 0.05, morepreferably the quotient for each sequence of the set does not vary fromthe quotient for the combined sequences by more than 0.01.

Also, it is preferred in (F)(I) that the quotient of the sum of G and Cdivided by the sum of A, T/U, G and C for all combined sequences of theset is between about 0.15 and 0.35, more preferably between about 0.2and 0.3, more preferably between about 0.21 and 0.29, more preferablybetween about 0.22 and 0.28, more preferably between about 0.23 and0.27, even more preferably between about 0.24 and 0.26, and mostpreferably the quotient is 0.25.

Preferably, in (D) up to two bases can be inserted at any location ofany of the sequences or up to two bases can be deleted from any of thesequences, more preferably only one base can be inserted at any locationof any of the sequences or one base can be deleted from any of thesequences, and most preferably no base is inserted at any location ofany of the sequences.

Also, it is preferred that in (D), no base can be deleted from any ofthe sequences, and most preferably, in (D) no base can be inserted at ordeleted from any location of any of the sequences.

In preferred compositions, each of the oligonucleotides of a set has asequence at least eleven contiguous bases of the sequence on which it isbased; or more preferably each of the oligonucleotides of a set has asequence at least twelve contiguous bases of the sequence on which it isbased; or more preferably each of the oligonucleotides of a set has asequence at least thirteen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least fourteen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least fifteen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least sixteen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least seventeen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least eighteen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least nineteen contiguous bases of the sequence on which itis based; or more preferably each of the oligonucleotides of a set has asequence at least twenty contiguous bases of the sequence on which it isbased; or more preferably each of the oligonucleotides of a set has asequence at least twenty-one contiguous bases of the sequence on whichit is based; or more preferably each of the oligonucleotides of a sethas a sequence at least twenty-two contiguous bases of the sequence onwhich it is based; or more preferably each of the oligonucleotides of aset has a sequence at least twenty-three contiguous bases of thesequence on which it is based; or more preferably each of theoligonucleotides of a set has a sequence at least twenty-four contiguousbases of the sequence on which it is based.

Preferably, each of the oligonucleotides of a set is up to thirty basesin length; or more preferably each of the oligonucleotides of a set isup to twenty-nine bases in length; or more preferably each of theoligonucleotides of a set is up to twenty-eight bases in length; or morepreferably each of the oligonucleotides of a set is up to twenty-sevenbases in length; or more preferably each of the oligonucleotides of aset is up to twenty-six bases in length; or more preferably each of theoligonucleotides of a set is up to twenty-five bases in length; or morepreferably each of the oligonucleotides of a set is up to twenty-fourbases in length.

In certain preferred embodiments, each of the oligonucleotides of a sethas a length of within five bases of the average length of all of theoligonucleotides in the set; or more preferably each of theoligonucleotides of a set has a length of within four bases of theaverage length of all of the oligonucleotides in the set; or morepreferably each of the oligonucleotides of a set has a length of withinthree bases of the average length of all of the oligonucleotides in theset; or more preferably each of the oligonucleotides of a set has alength of within two bases of the average length of all of theoligonucleotides in the set; or more preferably each of theoligonucleotides of a set has a length of within one base of the averagelength of all of the oligonucleotides in the set.

Preferably, the string of contiguous bases of each oligonucleotide of asaid set are selected such that the position of the first base of eachstring within the sequence on which it is based is the same for allnucleotides of the set.

In preferred embodiments, the composition includes at least ten saidmolecules, or at least eleven said molecules, or at least twelve saidmolecules, or at least thirteen said molecules, or at least fourteensaid molecules, or at least fifteen said molecules, or at least sixteensaid molecules, or at least seventeen said molecules, or at leasteighteen said molecules, or at least nineteen said molecules, or atleast twenty said molecules, or at least twenty-one said molecules, orat least twenty-two said molecules, or at least twenty-three saidmolecules, or at least twenty-four said molecules, or at leasttwenty-five said molecules, or at least twenty-six said molecules, or atleast twenty-seven said molecules, or at least twenty-eight saidmolecules, or at least twenty-nine said molecules, or at least thirtysaid molecules, or at least thirty-one said molecules, or at leastthirty-two said molecules, or at least thirty-three said molecules, orat least thirty-four said molecules, or at least thirty-five saidmolecules, or at least thirty-six said molecules, or at leastthirty-seven said molecules, or at least thirty-eight said molecules, orat least thirty-nine said molecules, or at least forty said molecules,or at least forty-one said molecules, or at least forty-two saidmolecules, or at least forty-three said molecules, or at leastforty-four said molecules, or at least forty-five said molecules, or atleast forty-six said molecules, or at least forty-seven said molecules,or at least forty-eight said molecules, or at least forty-nine saidmolecules, or at least fifty said molecules, or at least sixty saidmolecules, or at least seventy said molecules, or at least eighty saidmolecules, or at least ninety said molecules, or at least one hundredsaid molecules, or at least, depending upon the size of the group ofsequences on which the oligonucleotides are based, one hundred and tensaid molecules, or at least one hundred and twenty said molecules, or atleast one hundred and thirty said molecules, or at least one hundred andforty said molecules, or at least one hundred and fifty said molecules,or at least one hundred and sixty said molecules, or at least onehundred and seventy said molecules, or at least one hundred and eightysaid molecules, or at least one hundred and ninety said molecules, or atleast two hundred said molecules.

A person skilled in the art will appreciate that, depending upon the useto which a family of oligonucleotides of the invention are to be put, itmay or may not be desirable to include with sequences that can bedistinguished one from the other (i.e., are minimally cross-hybridizing)a number of sequences that do cross hybridize with each other.

In a preferred aspect, the invention is a composition wherein in(II)(i), any consecutive sequence of bases in the phantom sequence whichis identical to a consecutive sequence of bases in each of the first andsecond sequences from which it is generated is no more than ((⅔×L)−1)bases in length. More preferably, the phantom sequence, if greater thanor equal to (¾×L) in length, contains at least 3 insertions/deletions ormismatches when compared to the first and second sequences from which itis generated, and even more preferably, the phantom sequence, if greaterthan or equal to (⅔×L) in length, contains at least 3insertions/deletions or mismatches when compared to the first and secondsequences from which it is generated.

In another preferred aspect, in (II)(iii), the phantom sequence is notgreater than or equal to (⅚×L) in length, more preferably, the phantomsequence is not greater than or equal to (¾×L) in length.

In another broad aspect, the invention is a composition containingmolecules for use as tags or tag complements wherein each moleculecomprises an oligonucleotide selected from a set of oligonucleotidesbased on a group of sequences having the numeric patterns of thesequences tested in Example 2, as set out in Table IA, wherein:

-   -   (A) wherein 1 WXYY, each 2=YWXY, each 3=XXXW, each 4=YWYX, each        5=WYXY, each 6=YYWX, each 7=YWXX, each 8=WYXX, each 9=XYYW, each        10=XYWX, each 11=YYXW, each 12=WYYX, each 13=XYXW, each 14=WYYY,        each 15=WXYW, each 16=WYXW, each 17=WXXW, each 18=WYYW, each        19=XYYX, each 20=YXYX, each 21=YXXY and each 22=XYXY;    -   (B) each of W, X and Y is a base in which either:        -   (i) (a) W=one of A, T/U, G, and C,            -   X=one of A, T/U, G, and C,            -   Y=one of A, T/U, G, and C,            -   and each of W, X and Y is selected so as to be different                from all of the others of W, X and Y,        -   (b) an unselected said base of (i)(a) can be substituted any            number of times for any one of W, X and Y, or        -   (ii) (a) W=G or C,            -   X=A or T/U,            -   Y=A or T/U,            -   and X≠Y, and        -   (b) a base not selected in (ii)(a) can be inserted into each            sequence at one or more locations, the location of each            insertion being the same in all the sequences;    -   (C) up to three bases can be inserted at any location of any of        the sequences or up to three bases can be deleted from any of        the sequences;    -   (D) all of the sequences of a said group of oligonucleotides are        read 5′ to 3′ or are read 3′ to 5′; and        wherein each oligonucleotide of a said set has a sequence of at        least ten contiguous bases of the sequence on which it is based,        provided that:    -   (E) the quotient of the sum of G and C divided by the sum of A,        T/U, G and C for all combined sequences of the set is between        about 0.1 and 0.40 and said quotient for each sequence of the        set does not vary from the quotient for the combined sequences        by more than 0.2; and    -   (F) for the group of 24mer sequences in which each 1=GATT, each        2=TGAT, each 3=AAAG, each 4=TGTA, each 5=GTAT, each 6=TTGA, each        7=TGAA, each 8=GTAA, each 9=ATTG, each 10=ATGA, each 11=TTAG,        each 12=GTTA, each 13=ATAG, each 14=GTTT, each 15=GATG, each        16=GTAG, each 17=GAAG, each 18=GTTG, each 19=ATTA, each 20=TATA,        each 21=TAAT and each 22=ATAT, for the group of sequences in        which each 1=GATT, each 2=TGAT, each 3=AAAG, each 4=TGTA, each        5=GTAT, each 6=TTGA, each 7=TGAA, each 8=GTAA, each 9=ATTG, each        10=ATGA, each 11=TTAG, each 12=GTTA, each 13=ATAG, each 14=GTTT,        each 15=GATG, each 16=GTAG, each 17=GAAG, each 18=GTTG, each        19=ATTA, each 20=TATA, each 21=TAAT and each 22=ATAT, under a        defined set of conditions in which the maximum degree of        hybridization between a sequence and any complement of a        different sequence of the group of 24mer sequences does not        exceed 30% of the degree of hybridization between said sequence        and its complement, for all oligonucleotides of the set, the        maximum degree of hybridization between an oligonucleotide and a        complement of any other oligonucleotide of the set does not        exceed 50% of the degree of hybridization of the oligonucleotide        and its complement;        wherein any base present may be substituted by an analogue        thereof.

Again, preferably, the contiguous bases of each oligonucleotide of a setare selected such that the position of the first base of eacholigonucleotide within the sequence on which it is based is the same forall nucleotides of the set.

In a preferred aspect, subject to the provisos of (E) and (F) above,each oligonucleotide of a said set comprises a said sequence oftwenty-four contiguous bases of the sequence on which it is based.

More preferably, subject to the proviso of (F) each oligonucleotide of asaid set comprises a said sequence of twenty-four contiguous bases ofthe sequence on which it is based.

In particularly preferred aspects, in (B), W=one of G and C; X=one of Aand T/U; and Y=one of A and T/U.

Even more preferred, in (B): =G; X=one of A, and T/U; and Y=one of A andT/U.

In another broad aspect, the invention is a composition that includesfifty minimally cross-hybridizing molecules for use as tags or tagcomplements wherein each molecule comprises an oligonucleotidecomprising a sequence of nucleotide bases for which, under a defined setof conditions, the maximum degree of hybridization between a saidoligonucleotide and any complement of a different oligonucleotide doesnot exceed about 10% of the degree of hybridization between saidoligonucleotide and its complement.

A preferred set of such defined conditions results in a level ofhybridization that is the same as the level of hybridization obtainedwhen hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08%Triton X-100, pH 8.0 at 37° C., and the sequences are covalently linkedto microparticles. Of course, these conditions are preferably useddirectly.

Preferably, under the defined set of conditions, whatever the conditionsare, the degree of hybridization between each oligonucleotide and itscomplement varies by a factor of between 1 and 8.

Preferably, each oligonucleotide is the same length and is at leasttwenty nucleotide bases in length. More preferably, each oligonucleotideis twenty-four nucleotide bases in length.

In certain embodiments, each molecule of a composition is linked to asolid phase support so as to be distinguishable from a mixture of saidmolecules by hybridization to its complement. Each such molecule can belinked to a defined location on such a solid phase support, the definedlocation for each molecule being different than the defined location forother, different, molecules.

In one preferred embodiment, the solid phase support is a microparticleand each said molecule is covalently attached to a differentmicroparticle than each other different said molecule.

The invention includes kits for sorting and identifying polynucleotides.Such a kit can include one or more solid phase supports each having oneor more spatially discrete regions, each such region having a uniformpopulation of substantially identical tag complements covalentlyattached. The tag complements are made up of a set of oligonucleotidesof the invention.

The one or more solid phase supports can be a planar substrate in whichthe one or more spatially discrete regions is a plurality of spatiallyaddressable regions.

The tag complements can also be coupled to microparticles.Microparticles preferably each have a diameter in the range of from 5 to40 μm.

Such a kit preferably includes microparticles that arespectrophotometrically unique, and therefore distinguishable from eachother according to conventional laboratory techniques. Of course forsuch kits to work, each type of microparticle would generally have onlyone tag complement associated with it, and usually there would be adifferent oligonucleotide tag complement associated with (attached to)each type of microparticle.

The invention includes methods of using families of oligonucleotides ofthe invention.

One such method is of analyzing a biological sample containing abiological sequence for the presence of a mutation or polymorphism at alocus of the nucleic acid. The method includes:

-   (A) amplifying the nucleic acid molecule in the presence of a first    primer having a 5′-sequence having the sequence of a tag    complementary to the sequence of a tag complement belonging to a    family of tag complements of the invention to form an amplified    molecule with a 5′-end with a sequence complementary to the sequence    of the tag;-   (B) extending the amplified molecule in the presence of a polymerase    and a second primer having 5′-end complementary the 3′-end of the    amplified sequence, with the 3′-end of the second primer extending    to immediately adjacent said locus, in the presence of a plurality    of nucleoside triphosphate derivatives each of which is: (i) capable    of incorporation during transciption by the polymerase onto the    3′-end of a growing nucleotide strand; (ii) causes termination of    polymerization; and (iii) capable of differential detection, one    from the other, wherein there is a said derivative complementary to    each possible nucleotide present at said locus of the amplified    sequence;-   (C) specifically hybridizing the second primer to a tag complement    having the tag complement sequence of (A); and-   (D) detecting the nucleotide derivative incorporated into the second    primer in (B) so as to identify the base located at the locus of the    nucleic acid.

In another method of the invention, a biological sample containing aplurality of nucleic acid molecules is analyzed for the presence of amutation or polymorphism at a locus of each nucleic acid molecule, foreach nucleic acid molecule. This method includes steps of:

-   (A) amplifying the nucleic acid molecule in the presence of a first    primer having a 5′-sequence having the sequence of a tag    complementary to the sequence of a tag complement belonging to a    family of tag complements of the invention to form an amplified    molecule with a 5′-end with a sequence complementary to the sequence    of the tag;-   (B) extending the amplified molecule in the presence of a polymerase    and a second primer having 5′-end complementary the 3′-end of the    amplified sequence, the 3′-end of the second primer extending to    immediately adjacent said locus, in the presence of a plurality of    nucleoside triphosphate derivatives each of which is: (i) capable of    incorporation during transciption by the polymerase onto the 3′-end    of a growing nucleotide strand; (ii) causes termination of    polymerization; and (iii) capable of differential detection, one    from the other, wherein there is a said derivative complementary to    each possible nucleotide present at said locus of the amplified    molecule;-   (C) specifically hybridizing the second primer to a tag complement    having the tag complement sequence of (A); and-   (D) detecting the nucleotide derivative incorporated into the second    primer in (B) so as to identify the base located at the locus of the    nucleic acid;    wherein each tag of (A) is unique for each nucleic acid molecule and    steps (A) and (B) are carried out with said nucleic molecules in the    presence of each other.

Another method includes analyzing a biological sample that contains aplurality of double stranded complementary nucleic acid molecules forthe presence of a mutation or polymorphism at a locus of each nucleicacid molecule, for each nucleic acid molecule. The method includes stepsof:

-   (A) amplifying the double stranded molecule in the presence of a    pair of first primers, each primer having an identical 5′-sequence    having the sequence of a tag complementary to the sequence of a tag    complement belonging to a family of tag complements of the invention    to form amplified molecules with 5′-ends with a sequence    complementary to the sequence of the tag;-   (B) extending the amplified molecules in the presence of a    polymerase and a pair of second primers each second primer having a    5′-end complementary a 3′-end of the amplified sequence, the 3′-end    of each said second primer extending to immediately adjacent said    locus, in the presence of a plurality of nucleoside triphosphate    derivatives each of which is: (i) capable of incorporation during    transciption by the polymerase onto the 3′-end of a growing    nucleotide strand; (ii) causes termination of polymerization;    and (iii) capable of differential detection, one from the other;-   (C) specifically hybridizing each of the second primers to a tag    complement having the tag complement sequence of (A); and-   (D) detecting the nucleotide derivative incorporated into the second    primers in (B) so as to identify the base located at said locus;    wherein the sequence of each tag of (A) is unique for each nucleic    acid molecule and steps (A) and (B) are carried out with said    nucleic molecules in the presence of each other.

In yet another aspect, the invention is a method of analyzing abiological sample containing a plurality of nucleic acid molecules forthe presence of a mutation or polymorphism at a locus of each nucleicacid molecule, for each nucleic acid molecule, the method includingsteps of:

-   (a) hybridizing the molecule and a primer, the primer having a    5′-sequence having the sequence of a tag complementary to the    sequence of a tag complement belonging to a family of tag    complements of the invention and a 3′-end extending to immediately    adjacent the locus;-   (b) enzymatically extending the 3′-end of the primer in the presence    of a plurality of nucleoside triphosphate derivatives each of which    is: (i) capable of enzymatic incorporation onto the 3′-end of a    growing nucleotide strand; (ii) causes termination of said    extension; and (iii) capable of differential detection, one from the    other, wherein there is a said derivative complementary to each    possible nucleotide present at said locus;-   (c) specifically hybridizing the extended primer formed in step (b)    to a tag complement having the tag complement sequence of (a); and-   (d) detecting the nucleotide derivative incorporated into the primer    in step (b) so as to identify the base located at the locus of the    nucleic acid molecule;    wherein each tag of (a) is unique for each nucleic acid molecule and    steps (a) and (b) are carried out with said nucleic molecules in the    presence of each other.

The derivative can be a dideoxy nucleoside triphosphate.

Each respective complement can be attached as a uniform population ofsubstantially identical complements in spacially discrete regions on oneor more solid phase support(s).

Each tag complement can include a label, each such label being differentfor respective complements, and step (d) can include detecting thepresence of the different labels for respective hybridization complexesof bound tags and tag complements.

Another aspect of the invention includes a method of determining thepresence of a target suspected of being contained in a mixture.

The method includes the steps of:

-   (i) labelling the target with a first label;-   (ii) providing a first detection moiety capable of specific binding    to the target and including a first tag;-   (iii) exposing a sample of the mixture to the detection moiety under    conditions suitable to permit (or cause) said specific binding of    the molecule and target;-   (iv) providing a family of suitable tag complements of the invention    wherein the family contains a first tag complement having a sequence    complementary to that of the first tag;-   (v) exposing the sample to the family of tag complements under    conditions suitable to permit (or cause) specific hybridization of    the first tag and its tag complement;-   (vi) determining whether a said first detection moiety hybridized to    a first said tag complement is bound to a said labelled target in    order to determine the presence or absence of said target in the    mixture.

Preferably, the first tag complement is linked to a solid support at aspecific location of the support and step (vi) includes detecting thepresence of the first label at said specified location.

Also, the first tag complement can include a second label and step (vi)includes detecting the presence of the first and second labels in ahybridized complex of the moiety and the first tag complement.

Further, the target can be selected from the group consisting of organicmolecules, antigens, proteins, polypeptides, antibodies and nucleicacids. The target can be an antigen and the first molecule can be anantibody specific for that antigen.

The antigen is usually a polypeptide or protein and the labelling stepcan include conjugation of fluorescent molecules, digoxigenin,biotinylation and the like.

The target can be a nucleic acid and the labelling step can includeincorporation of fluorescent molecules, radiolabelled nucleotide,digoxigenin, biotinylation and the like.

DETAILED DESCRIPTION OF THE INVENTION FIGURES

Reference is made to the attached figures in which,

FIGS. 1A and 1B illustrate results obtained in the cross-hybridizationexperiments described in Example 1. FIG. 1A shows the hybridizationpattern found when a microarray containing all 100 probes (SEQ ID NOs:1to 100) was hybridized with a 24mer oligonucleotide having thecomplementary sequence to SEQ ID NO:3 (target). FIG. 1B shows thepattern observed when a similar array was hybridized with a mix of all100 targets, i.e., oligonucleotides having the sequences complementaryto SEQ ID NOs:1 to 100.

FIG. 2 shows the intensity of the signal (MFI) for each perfectlymatched sequence (indicated in Table I) and its complement obtained asdescribed in Example 2.

FIG. 3 is a three dimensional representation showing cross-hybridizationobserved for the sequences of FIG. 2 as described in Example 2. Theresults shown in FIG. 2 are reproduced along the diagonal of thedrawing.

FIG. 4 is illustrative of results obtained for an individual target (SEQID NO:23, target No. 16) when exposed to the 100 probes of Example 2.The MFI for each bead is plotted.

DETAILED EMBODIMENTS

The invention provides a family of minimally cross-hybridizingsequences. The invention includes a method for sorting complex mixturesof molecules by the use of families of the sequences as oligonucleotidesequence tags. The families of oligonucleotide sequence tags aredesigned so as to provide minimal cross hybridization during the sortingprocess. Thus any sequence within a family of sequences will not crosshybridize with any other sequence derived from that family underappropriate hybridization conditions known by those skilled in the art.The invention is particularly useful in highly parallel processing ofanalytes.

Families of Oligonucleotide Sequence Tags

The present invention includes a family of 24mer polynucleotides, thathave been demonstrated to be minimally cross-hybridizing with eachother. This family of polynucleotides is thus useful as a family oftags, and their complements as tag complements.

The oligonucleotide sequences that belong to families of sequences thatdo not exhibit cross hybridization behavior can be derived by computerprograms (described in international patent publication NO. WO01/59151). The programs use a method of generating a maximum number ofminimally cross-hybridizing polynucleotide sequences that can besummarized as follows. First, a set of sequences of a given length arecreated based on a given number of block elements. Thus, if a family ofpolynucleotide sequences 24 nucleotides (24mer) in length is desiredfrom a set of 6 block elements, each element comprising 4 nucleotides,then a family of 24mers is generated considering all positions of the 6block elements. In this case, there will be 6⁶ (46,656) ways ofassembling the 6 block elements to generate all possible polynucleotidesequences 24 nucleotides in length.

Constraints are imposed on the sequences and are expressed as a set ofrules on the identities of the blocks such that homology between any twosequences will not exceed the degree of homology desired between thesetwo sequences. All polynucleotide sequences generated which obey therules are saved. Sequence comparisons are performed in order to generatean incidence matrix. The incidence matrix is presented as a simple graphand the sequences with the desired property of being minimally crosshybridizing are found from a clique of the simple graph, which may havemultiple cliques. Once a clique containing a suitably large number ofsequences is found, the sequences are experimentally tested to determineif it is a set of minimally cross hybridizing sequences. This method hasbeen used to obtain the 100 non cross-hybridizing tags of Table I thatare the subject of this patent application.

The method includes a rational approach to the selection of groups ofsequences that are used to describe the blocks. For example there are n⁴different tetramers that can be obtained from n different nucleotides,non-standard bases or analogues thereof. In a more preferred embodimentthere are 4⁴ or 256 possible tetramers when natural nucleotides areused. More preferably 81 possible tetramers when only 3 bases are usedA, T and G. Most preferably 32 different tetramers when all sequenceshave only one G.

Block sequences can be composed of a subset of natural bases mostpreferably A, T and G. Sequences derived from blocks that are deficientin one base possess useful characteristics, for example, in reducingpotential secondary structure formation or reduced potential for crosshybridization with nucleic acids in nature. Sets of block sequences thatare most preferable in constructing families of non cross hybridizingtag sequences should contribute approximately equivalent stability tothe formation of the correct duplex as all other block sequences of theset. This should provide tag sequences that behave isothermally. Thiscan be achieved for example by maintaining a constant base compositionfor all block sequences such as one G and three A's or T's for eachblock sequence. Preferably, non-cross hybridizing sets of blocksequences will be comprised from blocks of sequences that areisothermal. The block sequences should be different from each other byat least one mismatch. Guidance for selecting such sequences is providedby methods for selecting primer and or probe sequences that can be foundin published techniques (Robertson et al., Methods Mol Biol;98:121-54(1998); Rychlik et al, Nucleic Acids Research, 17:8543-8551 (1989);Breslauer et al., Proc Natl Acad. Sci., 83:3746-3750 (1986)) and thelike. Additional sets of sequences can be designed by extrapolating onthe original family of non cross hybridizing sequences by simple methodsknown to those skilled in the art.

A preferred family of 100 tags is shown as SEQ ID NOs:1 to 100 in TableI. Characterization of the family of 100 sequence tags was performed todetermine the ability of these sequences to form specific duplexstructures with their complementary sequences and to assess thepotential for cross hybridization. The 100 sequences were synthesizedand spotted onto glass slides where they were coupled to the surface byamine linkage. Complementary tag sequences were Cy3-labeled andhybridized individually to the array containing the family of 100sequence tags. Formation of duplex structures was detected andquantified for each of the positions on the array. Each of the tagsequences performed as expected, that is the perfect match duplex wasformed in the absence of significant cross hybridization under stringenthybridization conditions. The results of a sample hybridization areshown in FIG. 1. FIG. 1 a shows the hybridization pattern seen when amicroarray containing all 100 probes was hybridized with the targetcomplementary to probe 181234. The 4 sets of paired spots correspond tothe probe complementary to the target. FIG. 1 b shows the pattern seenwhen a similar array was hybridized with a mix of all 100 targets. Theseresults indicate that the family of sequences which is the subject ofthis patent can be used as a family of non-cross hybridizing (tag)sequences.

The family of 100 non-cross-hybridizing sequences can be expanded byincorporating additional tetramer sequences that are used inconstructing further 24mer oligonucleotides. In one example, fouradditional words were included in the generation of new sequences to beconsidered for inclusion as non-cross talkers in a family of sequencesthat were obtained from the above method using 10 tetramers. In thiscase, the four additional words were selected to avoid potentialhomologies with all potential combinations of other words: YYXW (TTAG);WYYX (GTTA); XYXW (ATAG) and WYYY (GTTT). The total number of sequencescontaining six words using the 14 possible words is 14⁶ or 7,529,536.These sequences were screened to eliminate sequences that containrepetitive regions that present potential hybridization problems such asfour or more of a similar base (e.g., AAAA or TTTT) or pairs of G's.Each of these sequences was compared to the sequence set of the originalfamily of 100 non-cross-hybridizing sequences (SEQ ID NOs:1 to 100). Anynew sequence that contained a minimal threshold of homology (that doesnot include the use of insertions or deletions) such as 15 or morematches with any of the original family of sequences was eliminated. Inother words, if it was possible to align a new sequence with one or moreof the original 100 sequences so as to obtain a maximum simple homologyof {fraction (15/24)} or more, the new sequence was dropped. “Simplehomology” between a pair of sequences is defined here as the number ofpairs of nucleotides that are matching (are the same as each other) in acomparison of two aligned sequences divided by the total number ofpotential matches. “Maximum simple homology” is obtained when twosequences are aligned with each other so as to have the maximum numberof paired matching nucleotides. In any event, the set of new sequencesso obtained was referred to as the “candidate sequences”. One of thecandidate sequences was arbitrarily chosen and referred to as sequence101. All the candidate sequences were checked against sequence 101, andsequences that contained 15 or more non-consecutive matches (i.e., amaximum simple homology of {fraction (15/24)} (62.5%) or more wereeliminated. This results in a smaller set of candidate sequences fromwhich another sequence is selected that is now referred to as sequence102. The smaller set of candidate sequences is now compared to sequence102 eliminating sequences that contained 15 or more non-consecutivematches and the process is repeated until there are no candidatesequences remaining. Also, any sequence selected from the candidatesequences is eliminated if it has 13 or more consecutive matches withany other previously selected candidate sequence.

The additional set of 73 tag sequences so obtained (SEQ ID NOs:101 to173) is composed of sequences that when compared to any of SEQ ID NOs:1to 100 of Table I have no greater similarity than the sequences of theoriginal 100 sequence tags of Table I. The sequence set as derived fromthe original family of non cross hybridizing sequences, SEQ ID NOs:1 to173, are expected to behave with similar hybridization properties to thesequences having SEQ ID NOs:1 to 100 since it is understood thatsequence similarity correlates directly with cross hybridization(Southern et al., Nat. Genet.; 21, 5-9: 1999).

The set of 173 24mer oligonucleotides were expanded to include thosehaving SEQ ID NOs:174 to 210 as follows. The 4mers WXYW, XYXW, WXXW,WYYW, XYYX, YXYX, YXXY and XYXY where W=G, X=A, and Y=U/T were used incombination with the fourteen 4mers used in the generation of SEQ IDNOs:1 to 173 to generate potential 24-base oligonucleotides. Excludedfrom the set were those containing the sequence patterns GG, AAAA andTTTT. To be included in the set of additional 24mers, a sequence alsohad to have at least one of the 4mers containing two G's: WXYW (GATG),WYXW (GTAG), WXXW (GAAG), WYYW (GTTG) while also containing exactly sixG's. Also required for a 24mer to be included was that there be at mostsix bases between every neighboring pair of G's. Another way of puttingthis is that there are at most six non-G's between any two G's. Also,each G nearest the 5′-end of its oligonucleotide (the left-hand side aswritten in Table I) was required to occupy one of the first to seventhpositions (counting the 5′-terminal position as the first position.) Aset of candidate sequences was obtained by eliminating any new sequencethat was found to have a maximum simple homology of {fraction (16/24)}or more with any of the previous set of 173 oligonucleotides (SEQ IDNOs:1 to 173). As above, an arbitrary 174^(th) sequence was chosen andcandidate sequences eliminated by comparison therewith. In this case thepermitted maximum degree of simple homology was {fraction (16/24)}. Asecond sequence was also eliminated if there were ten consecutivematches between the two (i.e., it was notionally possible to generate aphantom sequence containing a sequence of 10 -bases that is identical toa sequence in each of the sequences being compared). A second sequencewas also eliminated if it was possible to generate a phantom sequence 20bases in length or greater.

A property of the polynucleotide sequences shown in Table I is that themaximum block homology between any two sequences is never greater than66⅔ percent. This is because the computer algorithm by which thesequences were initially generated was designed to prevent such anoccurrence. It is within the capability of a person skilled in the art,given the family of sequences of Table I, to modify the sequences, oradd other sequences while largely retaining the property ofminimal-cross hybridization which the polynucleotides of Table I havebeen demonstrated to have.

There are 210 polynucleotide sequences given in Table I. Since all 210of this family of polynucleotides can work with each other as aminimally cross-hybridizing set, then any plurality of polynucleotidesthat is a subset of the 210 can also act as a minimallycross-hybridizing set of polynucleotides. An application in which, forexample, 30 molecules are to be sorted using a family of polynucleotidetags and tag complements could thus use any group of 30 sequences shownin Table I. This is not to say that some subsets may be found inpractical sense to be more preferred than others. For example, it may befound that a particular subset is more tolerant of a wider variety ofconditions under which hybridization is conducted before the degree ofcross-hybridization becomes unacceptable.

It may be desirable to use polynucleotides that are shorter in lengththan the 24 bases of those in Table I. A family of subsequences (i.e.,subframes of the sequences illustrated) based on those contained inTable I having as few as 10 bases per sequence could be chosen, so longas the subsequences are chosen to retain homological properties betweenany two of the sequences of the family important to their noncross-hybridization.

The selection of sequences using this approach would be amenable to acomputerized process. Thus for example, a string of 10 contiguous basesof the first 24mer of Table II could be selected:GATTTGTATTGATTGAGATTAAAG.

A string of contiguous bases from the second 24mer could then beselected and compared for maximum homology against the first chosensequence: TGATTGTAGTATGTATTGATAAAG.

Systematic pairwise comparison could then be carried out to determine ifthe maximum homology requirement of 66⅔ percent is violated: AlignmentMatches          GATTTGTATT 1 ATTGATAAAG          GATTTGTATT 0 ATTGATAAAG          GATTTGTATT 1   ATTGATAAAG          GATTTGTATT 1   ATTGATAAAG          GATTTGTATT 1     ATTGATAAAG          GATTTGTATT 1     ATTGATAAAG          GATTTGTATT 3       ATTGATAAAG         GATTTGTATT 1        ATTGATAAAG          GATTTGTATT 2        ATTGATAAAG          GATTTGTATT 2          ATTGATAAAG         GATTTGTATT 5 (*)           ATTGATAAAG          GATTTGTATT 3           ATTGATAAAG          GATTTGTATT 3             ATTGATAAAG         GATTTGTATT 2              ATTGATAAAG          GATTTGTATT 1              ATTGATAAAG          GATTTGTATT 1                ATTGATAAAG         GATTTGTATT 3                 ATTGATAAAG          GATTTGTATT 1                 ATTGATAAAG          GATTTGTATT 0                  ATTGATAAAG

As can be seen, the maximum homology between the two selectedsubsequences is 50 percent (5 matches out of the total length of 10),and so these two sequences are compatible with each other.

A 10mer subsequence can be selected from the third 24mer sequence ofTable I, and pairwise compared to each of the first two 10mer sequencesto determine its compatibility therewith, etc. and in this way a familyof 10mer sequences developed.

It is within the scope of this invention, to obtain families ofsequences containing 11mer, 12mer, 13mer, 14mer, 15mer, 16mer, 17mer,18mer, 19mer, 20mer, 21mer, 22mer and 23mer sequences by analogy to thatshown for 10mer sequences.

It may be desirable to have a family of sequences in which there aresequences greater in length than the 24mer sequences shown in Table I.It is within the capability of a person skilled in the art, given thefamily of sequences shown in Table I, to obtain such a family ofsequences. One possible approach would be to insert into each sequenceat one or more locations a nucleotide, non natural base or analogue suchthat the longer sequence should not have greater similarity than any twoof the original non cross hybridizing sequences of Table I and theaddition of extra bases to the tag sequences should not result in amajor change in the thermodynamic properties of the tag sequences ofthat set for example the GC content must be maintained between 10%-40%with a variance from the average of 20%. This method of inserting basescould be used to obtain a family of sequences up to 40 bases long.

Given a particular family of sequences that can be used as a family oftags (or tag complements), e.g., those of Table I or Table II, or thecombined sequences of these two tables, a skilled person will readilyrecognize variant families that work equally as well.

Again taking the sequences of Table I for example, every T could beconverted to an A and vice versa and no significant change in thecross-hybridization properties would be expected to be observed. Thiswould also be true if every G were converted to a C.

Also, all of the sequences of a family could be taken to be constructedin the 5′-3′ direction, as is the convention, or all of theconstructions of sequences could be in the opposite direction (3′-5′).

There are additional modifications that can be carried out. For example,C has not been used in the family of sequences. Substitution of C inplace of one or more G's of a particular sequence would yield a sequencethat is at least as low in homology with every other sequence of thefamily as the particular sequence chosen to be modified was. It is thuspossible to substitute C in place of one or more G's in any of thesequences shown in Table I. Analogously, substituting of C in place ofone or more A's is possible, or substituting C in place of one or moreT's is possible.

It is preferred that the sequences of a given family are of the same, orroughly the same length. Preferably, all the sequences of a family ofsequences of this invention have a length that is within five bases ofthe base-length of the average of the family. More preferably, allsequences are within four bases of the average base-length. Even morepreferably; all or almost all sequences are within three bases of theaverage base-length of the family. Better still, all or almost allsequences have a length that is within one of the base-length of theaverage of the family.

It is also possible for a person skilled in the art to derive sets ofsequences from the family of sequences that is the subject of thispatent and remove sequences that would be expected to have undesirablehybridization properties.

Methods For Synthesis of Oligonucleotide Families

Preferably oligonucleotide sequences of the invention are synthesizeddirectly by standard phosphoramidite synthesis approaches and the like(Caruthers et al, Methods in Enzymology; 154, 287-313: 1987; Lipshutz etal, Nature Genet.; 21, 20-24: 1999; Fodor et al, Science; 251, 763-773:1991). Alternative chemistries involving non natural bases such aspeptide nucleic acids or modified nucleosides that offer advantages induplex stability may also be used (Hacia et al; Nucleic Acids Res;27:4034-4039, 1999; Nguyen et al, Nucleic Acids Res.;27, 1492-1498: 1999;Weiler et al, Nucleic Acids Res.; 25, 2792-2799:1997). It is alsopossible to synthesize the oligonucleotide sequences of this inventionwith alternate nucleotide backbones such as phosphorothioate orphosphoroamidate nucleotides. Methods involving synthesis through theaddition of blocks of sequence in a step wise manner may also beemployed (Lyttle et al, Biotechniques, 19: 274-280 (1995). Synthesis maybe carried out directly on the substrate to be used as a solid phasesupport for the application or the oligonucleotide can be cleaved fromthe support for use in solution or coupling to a second support.

Solid Phase Supports

There are several different solid phase supports that can be used withthe invention. They include but are not limited to slides, plates,chips, membranes, beads, microparticles and the like. The solid phasesupports can also vary in the materials that they are composed ofincluding plastic, glass, silicon, nylon, polystyrene, silica gel, latexand the like. The surface of the support is coated with thecomplementary sequence of the same.

In preferred embodiments, the family of tag complement sequences arederivatized to allow binding to a solid support. Many methods ofderivatizing a nucleic acid for binding to a solid support are known inthe art (Hermanson G., Bioconjugate Techniques; Acad. Press: 1996). Thesequence tag may be bound to a solid support through covalent ornon-covalent bonds (Iannone et al, Cytometry; 39: 131-140, 2000; Matsonet al, Anal. Biochem.; 224: 110-106, 1995; Proudnikov et al, AnalBiochem; 259: 34-41, 1998; Zammatteo et al, Analytical Biochemistry;280:143-150, 2000). The sequence tag can be conveniently derivatized forbinding to a solid support by incorporating modified nucleic acids inthe terminal 5′ or 3′ locations.

A variety of moieties useful for binding to a solid support (e.g.,biotin, antibodies, and the like), and methods for attaching them tonucleic acids, are known in the art. For example, an amine-modifiednucleic acid base (available from, eg., Glen Research) may be attachedto a solid support (for example, Covalink-NH, a polystyrene surfacegrafted with secondary amino groups, available from Nunc) through abifunctional crosslinker (e.g., bis(sulfosuccinimidyl suberate),available from Pierce). Additional spacing moieties can be added toreduce steric hindrance between the capture moiety and the surface ofthe solid support.

Attaching Tags to Analytes for Sorting

A family of oligonucleotide tag sequences can be conjugated to apopulation of analytes most preferably polynucleotide sequences inseveral different ways including but not limited to direct chemicalsynthesis, chemical coupling, ligation, amplification, and the like.Sequence tags that have been synthesized with primer sequences can beused for enzymatic extension of the primer on the target for example inPCR amplification.

Detection of Single Nucleotide Polymorphisms Using Primer Extension

There are a number of areas of genetic analysis where families of noncross hybridizing sequences can be applied including disease diagnosis,single nucleotide polymorphism analysis, genotyping, expression analysisand the like. One such approach for genetic analysis referred to as theprimer extension method (also known as Genetic Bit Analysis (Nikiforovet al, Nucleic Acids Res.; 22, 4167-4175: 1994; Head et al Nucleic AcidsRes.; 25, 5065-5071: 1997)) is an extremely accurate method foridentification of the nucleotide located at a specific polymorphic sitewithin genomic DNA. In standard primer extension reactions, a portion ofgenomic DNA containing a defined polymorphic site is amplified by PCRusing primers that flank the polymorphic site. In order to identifywhich nucleotide is present at the polymorphic site, a third primer issynthesized such that the polymorphic position is located immediately 3′to the primer. A primer extension reaction is set up containing theamplified DNA, the primer for extension, up to 4 dideoxynucleosidetriphosphates, each labelled with a different fluorescent dye and a DNApolymerase such as the Klenow subunit of DNA Polymerase 1. The use ofdideoxy nucleotides ensure that a single base is added to the 3′ end ofthe primer, a site corresponding to the polymorphic site. In this waythe identity of the nucleotide present at a specific polymorphic sitecan be determined by the identity of the fluorescent dye-labellednucleotide that is incorporated in each reaction. One major drawback tothis approach is its low throughput. Each primer extension reaction iscarried out independently in a separate tube.

Universal sequences can be used to enhance the throughput of primerextension assay as follows. A region of genomic DNA containing multiplepolymorphic sites is amplified by PCR. Alternately, several genomicregions containing one or more polymorphic sites each are amplifiedtogether in a multiplexed PCR reaction. The primer extension reaction iscarried out as described above except that the primers used arechimeric, each containing a unique universal tag at the 5′ end and thesequence for extension at the 3′ end. In this way, each gene-specificsequence would be associated with a specific universal sequence. Thechimeric primers would be hybridized to the amplified DNA and primerextension carried out as described above. This would result in a mixedpool of extended primers, each with a specific fluorescent dyecharacteristic of the incorporated nucleotide. Following the primerextension reaction, the mixed extension reactions are hybridized to anarray containing probes that are reverse complements of the universalsequences on the primers. This would segregate the products of a numberof primer extension reactions into discrete spots. The fluorescent dyepresent at each spot would then identify the nucleotide incorporated ateach specific location.

Kits Using Families of Tag Sequences

The families of non cross-hybridizing sequences may be provided in kitsfor use in for example genetic analysis. Such kits include at least oneset of non cross hybridizing sequences in solution or on a solidsupport. Preferably the sequences are attached to microparticles and areprovided with buffers and reagents that are appropriate for theapplication. Reagents may include enzymes, nucleotides, fluorescentlabels and the like that would be required for specific applications.Instructions for correct use of the kit for a given application will beprovided.

EXAMPLES Example 1 Demonstration of Non Cross Talk Behavior on SolidArray

One hundred oligonucleotide probes corresponding to a family ofnon-cross talking oligonucleotides from Table I were synthesized byIntegrated DNA Technologies (IDT, Coralville Iowa). Theseoligonucleotides incorporated a C₆ aminolink group coupled to the 5′ endof the oligo through a C₁₈ ethylene glycol spacer. These probes wereused to prepare microarrays as follows. The probes were resuspended at aconcentration of 50 μM in 150 mM NaPO4, pH 8.5. The probes were spottedonto the surface of a SuperAldehyde slide (Telechem Int., SunnyvaleCalif.) using an SDDC-II microarray spotter (ESI, Toronto Ontario,Canada). The spots formed were approximately 120 μM in diameter with 200μM centre-to-centre spacing. Each probe was spotted 8 times on eachmicroarray. Following spotting, the arrays were processed essentially asdescribed by the slide manufacturer. Briefly, the arrays were treatedwith 67 mM sodium borohydride in PBS/EtOH (3:1) for 5 minutes thenwashed with 4 changes of 0.1% SDS. The arrays were not boiled.

One hundred labelled oligonucleotide targets were also synthesized byIDT. The sequence of these targets corresponded to the reversecomplement of the 100 probe sequences. The targets were labelled at the5′ end with Cy3.

Each Cy3-labeled target oligonucleotide was hybridized separately to twomicroarrays each of which contained all 100 oligonucleotide probes.Hybridizations were carried out at 42° C. for 2 hours in a 40 μlreaction and contained 40 nM of the labelled target suspended in 10 mMTris HCl, pH 8.3, 50 mM KCl, 0.1% Tween 20. These are low stringencyhybridization conditions designed to provide a rigorous test of theperformance of the family of non-cross hybridizing sequences.Hybridizations were carried out by depositing the hybridization solutionon a clean cover slip then carefully positioning the microarray slideover the cover slip in order to avoid bubbles. The slide was theninverted and transferred to a humid chamber for incubation. Followinghybridization, the cover slip was removed and the microarray was washedin hybridization buffer for 15 minutes at room temperature. The slidewas then dried by brief centrifugation.

Hybridized microarrays were scanned using a ScanArray Lite(GSI-Lumonics, Billerica Mass.). The laser power and photomultipliertube voltage used for scanning each hybridized microarray were optimizedin order to maximize the signal intensity from the spots representingthe perfect match.

The results of a sample hybridization are shown in FIGS. 1A and 1B. FIG.1A shows the hybridization pattern seen when a microarray containing all100 probes was hybridized with the target complementary to probe 181234.The 4 sets of paired spots correspond to the probe complementary to thetarget. FIG. 1 b shows the pattern seen when a similar array washybridized with a mix of all 100 targets.

Similar results to those illustrated in FIG. 1 a were obtained for allof the sequences tested, and the feasibility of the use of moleculescontaining oligonucleotides containing SEQ ID NOs:1 to 100 as a set oftags (or tag complements) is thus established.

Example 2 Cross Talk Behavior of Sequence on Beads

A group of 100 of the sequences of Table I was tested for feasibilityfor use as a family of minimally cross-hybridizing oligonucleotides. The100 sequences selected are separately indicated in Table I along withthe numbers assigned to the sequences in the tests.

The tests were conducted using the Luminex LabMAP™ platform availablefrom Luminex Corporation, Austin, Tex., U.S.A. The one hundredsequences, used as probes, were synthesized as oligonucleotides byIntegrated DNA Technologies (IDT, Coralville, Iowa, U.S.A.). Each probeincluded a C₆ aminolink group coupled to the 5′-end of theoligonucleotide through a C₁₂ ethylene glycol spacer. The C₆ aminolinkmolecule is a six carbon spacer containing an amine group that can beused for attaching the oligonucleotide to a solid support. One hundredoligonucleotide targets (probe complements), the sequence of each beingthe reverse complement of the 100 probe sequences, were also synthesizedby IDT. Each target was labelled at its 5′-end with biotin. Alloligonucleotides were purified using standard desalting procedures, andwere reconstituted to a concentration of approximately 200 μM insterile, distilled water for use. Oligonucleotide concentrations weredetermined spectrophotometrically using extinction coefficients providedby the supplier.

Each probe was coupled by its amino linking group to a carboxylatedfluorescent microsphere of the LabMAP system according to the Luminex¹⁰⁰protocol. The microsphere, or bead, for each probe sequence has unique,or spectrally distinct, light absorption characteristics which permitseach probe to be distinguished from the other probes. Stock bead pelletswere dispersed by sonication and then vortexing. For each beadpopulation, approximately five million microspheres (400 μL) wereremoved from the stock tube using barrier tips and added to a 1.5 mLEppendorf tube (USA Scientific). The microspheres were then centrifuged,the supernatant was removed, and beads were resuspended in 25 μL of 0.2M MES (2-(N-morpholino)ethane sulfonic acid) (Sigma), pH 4.5, followedby vortexing and sonication. One nmol of each probe (in a 25 μL volume)was added to its corresponding bead population. A volume of 2.5 μL ofEDC cross-linker (1-ethyl-3-(3-dimethylaminopropyl) carbodiimidehydrochloride (Pierce), prepared immediately before use by adding 1.0 mLof sterile ddH₂O to 10 mg of EDC powder, was added to each microspherepopulation. Bead mixes were then incubated for 30 minutes at roomtemperature in the dark with periodic vortexing. A second 2.5 μL aliquotof freshly prepared EDC solution was then added followed by anadditional 30 minute incubation in the dark. Following the second EDCincubation, 1.0 mL of 0.02% Tween-20 (BioShop) was added to each beadmix and vortexed. The microspheres were centrifuged, the supernatant wasremoved, and the beads were resuspended in 1.0 mL of 0.1% sodium dodecylsulfate (Sigma). The beads were centrifuged again and the supernatantremoved. The coupled beads were resuspended in 100 μL of 0.1 M MES pH4.5. Bead concentrations were then determined by diluting eachpreparation 100-fold in ddH₂O and enumerating using a NeubauerBrightLine Hemacytometer. Coupled beads were stored as individualpopulations at 2-8° C. protected from light.

The relative oligonucleotide probe density on each bead population wasassessed by Terminal Deoxynucleotidyl Transferase (TdT) end-labellingwith biotin-ddUTPs. TdT was used to label the 3′-ends of single-strandedDNA with a labeled ddNTP. Briefly, 180 μL of the pool of 100 beadpopulations (equivalent to about 4000 of each bead type) to be used forhybridizations was pipetted into an Eppendorf tube and centrifuged. Thesupernatant was removed, and the beads were washed in 1×TdT buffer. Thebeads were then incubated with a labelling reaction mixture, whichconsisted of 5×TdT buffer, 25 mM CoCl₂, and 1000 pmol of biotin-16-ddUTP(all reagents were purchased from Roche). The total reaction volume wasbrought up to 85.5 μL with sterile, distilled H₂O, and the samples wereincubated in the dark for 1 hour at 37° C. A second aliquot of enzymewas added, followed by a second 1 hour incubation. Samples were run induplicate, as was the negative control, which contained all componentsexcept the TdT. In order to remove unincorporated biotin-ddUTP, thebeads were washed 3 times with 200 μL of hybridization buffer, and thebeads were resuspended in 50 μL of hybridization buffer following thefinal wash. The biotin label was detected spectrophotometrically usingSA-PE (streptavidin-phycoerythrin conjugate). The streptavidin binds tobiotin and the phycoerythrin is spectrally distinct from the probebeads. The 10 mg/mL stock of SA-PE was diluted 100-fold in hybridizationbuffer, and 15 μL of the diluted SA-PE was added directly to eachreaction and incubated for 15 minutes at 37° Celsius. The reactions wereanalyzed on the Luminex¹⁰⁰ LabMAP. Acquisition parameters were set tomeasure 100 events per bead using a sample volume of 50 μL.

The results obtained are shown in FIG. 2. As can be seen the MeanFluorescent Intensity (MFI) of the beads varies from 277.75 to 2291.08,a range of 8.25-fold. Assuming that the labelling reactions are completefor all of the oligonucleotides, this illustrates the signal intensitythat would be obtained for each type of bead at this concentration ifthe target (i.e., labelled complement) was bound to the probe sequenceto the full extent possible.

The cross-hybridization of targets to probes was evaluated as follows.100 oligonucleotide probes linked to 100 different bead populations, asdescribed above, were combined to generate a master bead mix, enablingmultiplexed reactions to be carried out. The pool ofmicrosphere-immobilized probes was then hybridized individually witheach biotinylated target. Thus, each target was examined individuallyfor its specific hybridization with its complementary bead-immobilizedsequence, as well as for its non-specific hybridization with the other99 bead-immobilized universal sequences present in the reaction. Foreach hybridization reaction, 25 μL bead mix (containing about 2500 ofeach bead population in hybridization buffer) was added to each well ofa 96-well Thermowell PCR plate and equilibrated at 37° C. Each targetwas diluted to a final concentration of 0.002 fmol/μL in hybridizationbuffer, and 25 μL (50 fmol) was added to each well, giving a finalreaction volume of 50 μL. Hybridization buffer consisted of 0.2 M NaCl,0.1 M Tris, 0.08% Triton X-100, pH 8.0 and hybridizations were performedat 37° C. for 30 minutes. Each target was analyzed in triplicate and sixbackground samples (i.e. no target) were included in each plate. A SA-PEconjugate was used as a reporter, as described above. The 10 mg/mL stockof SA-PE was diluted 100-fold in hybridization buffer, and 15 μL of thediluted SA-PE was added directly to each reaction, without removal ofunbound target, and incubated for 15 minutes at 37° C. Finally, anadditional 35 μL of hybridization buffer was added to each well,resulting in a final volume of 100 μL per well prior to analysis on theLuminex¹⁰⁰ LabMAP. Acquisition parameters were set to measure 100 eventsper bead using a sample volume of 80 μL.

The percent hybridization was calculated for any event in which the NETMFI was at least 3 times the zero target background. In other words, acalculation was made for any sample where(MFI_(sample)−MFI_(zero target background))/MFI_(zero target background)≧3.

A “positive” cross-talk event (i.e., significant mismatch orcross-hybridization) was defined as any event in which the net medianfluorescent intensity (MFI_(sample)−MFI_(zero target background))generated by a mismatched hybrid was greater than or equal to thearbitrarily set limit of 10% that of the perfectly matched hybriddetermined under identical conditions. As there are 100 probes and 100targets, there are 100×100=10,0000 possible different interactionspossible of which 100 are the result of perfect hybridizations. Theremaining 9900 result from hybridization of a target with a mismatchedprobe.

The results obtained are illustrated in FIG. 3. The ability of eachtarget to be specifically recognized by its matching probe is shown. Ofthe possible 9900 non-specific hybridization events that could haveoccurred when the 100 targets were each exposed to the pool of 100probes, 6 events were observed. Of these 6 events, the highestnon-specific event generated a signal equivalent to 10.2% of the signalobserved for the perfectly matched pair (i.e. specific hybridizationevent).

Each of the 100 targets was thus examined individually for specifichybridization with its complement sequence as incorporated onto amicrosphere, as well as for non-specific hybridization with thecomplements of the other 99 target sequences. Representativehybridization results for target 16 (complement of probe 16, Table I)are shown in FIG. 4. Probe 16 was found to hybridize only to itsperfectly-matched target. No cross-hybridization with any of the other99 targets was observed.

The foregoing results demonstrate the possibility of incorporating the210 sequences of Table I, or any subset thereof, into a multiplexedsystem with the expectation that most if not all sequences can bedistinguished from the others by hybridization. That is, it is possibleto distinguish each target from the other targets by hybridization ofthe target with its precise complement and minimal hybridization withcomplements of the other target.

Example 3 Tag Sequences Used in Sorting Polynucleotides

The family of non cross hybridizing sequence tags or a subset thereofcan be attached to oligonucleotide probe sequences during synthesis andused to generate amplified probe sequences. In order to test thefeasibility of PCR amplification with non cross hybridizing sequencetags and subsequently addressing each respective sequence to itsappropriate location on two-dimensional or bead arrays, the followingexperiment was devised. A 24mer tag sequence was connected in a 5′-3′specific manner to a p53 exon specific sequence (20mer reverse primer).The connecting p53. sequence represented the inverse complement of thenucleotide gene sequence. To facilitate the subsequent generation ofsingle stranded DNA post-amplification the tag-Reverse primer wassynthesized with a phosphate modification (PO₄) on the 5′-end. A secondPCR primer was also generated for each desired exon, which representedthe Forward (5′-3′) amplification primer. In this instance the Forwardprimer was labeled with a 5′-biotin modification to allow detection withCy3-avidin or equivalent.

A practical example of the aforementioned description is as follows: Forexon 1 of the human p53 tumor suppressor gene sequence the followingtag-Reverse primer was generated:                           222087                      222063 5′-PO4-GATTGTAAGATTTGATAAAGTGTA-TCCAGGGAAGCGTGTCACCGTCGT-3′       Tag Sequence #3           Exon 1ReverseThe numbering above the Exon-1 reverse primer represents the genomicnucleotide positions of the indicated bases.

The corresponding Exon-1 Forward primer sequence is as follows:          221873                      2218965′-Biotin-TCATGGCGACTGTCCAGCTTTGTG-3′In combination these primers will amplify a product of 214 bp plus a 24bp tag extension yielding a total size of 238 bp.

Once amplified, the PCR product was purified using a QIAquick PCRpurification kit and the resulting DNA was quantified. To generatesingle stranded DNA, the DNA was subjected to λ-exonuclease digestionthereby resulting in the exposure of a single stranded sequence(anti-tag) complementary to the tag-sequence covalently attached to thesolid phase array. The resulting product was heated to 95° C. for 5minutes and then directly applied to the array at a concentration of10-50 nM. Following hybridization and concurrent sorting, the tag-Exon 1sequences were visualized using Cy3-streptavidin. In addition to directvisualization of the biotinylated product, the product itself can nowact as a substrate for further analysis of the amplified region, such asSNP detection and haplotype determination.

A number of additional methods for the detection of single nucleotidepolymorphisms, including but not limited to, allele specific polymerasechain reaction (ASPCR), allele specific primer extension (ASPE) andoligonucleotide ligation assay (OLA) can be performed by those skilledin the art in combination with the tag sequences described herein.

DEFINITIONS

Non cross hybridization: Describes the absence of hybridization betweentwo sequences that are not perfect complements of each other.

Cross Hybridization: The hydrogen bonding of a single-stranded DNAsequence that is partially but not entirely complementary to asingle-stranded substrate.

Homology: How closely related two or more separate strands of DNA are toeach other, based on their base sequences.

Analogue: A chemical which resembles a nucleotide base. A base whichdoes not normally appear in DNA but can substitute for the ones whichdo, despite minor differences in structure.

Complement: The opposite or “mirror” image of a DNA sequence. Acomplementary DNA sequence has an “A” for every “T” and a “C” for every“G”. Two complementary strands of single stranded DNA, for example a tagsequence and it's complement, will join to form a double-strandedmolecule.

Complementary DNA (cDNA): DNA that is synthesized from a messenger RNAtemplate; the single-stranded form is often used as a probe in physicalmapping.

Oligonucleotide: Refers to a short nucleotide polymer whereby thenucleotides may be natural nucleotide bases or analogues thereof.

Tag: Refers to an oligonucleotide that can be used for specificallysorting analytes with at least one other oligonucleotide that when usedtogether do not cross hybridize.

Similar Homology: In the context of this invention, pairs of sequencesare compared with each other based on the amount of “homology” betweenthe sequences. By way of example, two sequences are-said to have a 50%“maximum homology” with each other if, when the two sequences arealigned side-by-side with each other so to obtain the (absolute) maximumnumber of identically paired bases, the number of identically pairedbases is 50% of the total number of bases in one of the sequences. (Ifthe sequences being compared are of different lengths, then it would beof the total number of bases in the shorter of the two sequences.)Examples of determining maximum homology are as follows:

Example 4 Determining Maximum Homology

    *   * A-A-B-B-C-C     B-D-C-D-D-D (2 out of 4 paired bases are thesame)       *   * A-A-B-B-C-C       B-D-C-D-D-D (2 out of 3 paired basesare the same)

In this case, the maximum number of identically paired bases is two andthere are two possible alignments yielding this maximum number. Thetotal number of possible pairings is six giving 33⅓% ({fraction (2/6)})homology. The maximum amount of homology between the two sequences isthus ⅓.

Example 5 Determining Maximum Homology

* *     * A-A-B-B-C-A A-A-D-D-C-D (3 out of 6 paired bases are the same)

In this alignment, the number of identically paired bases is three andthe total number of possibly paired bases is six, so the homologybetween the two sequences is {fraction (3/6)} (50%).           *A-A-B-B-C-A           A-A-D-D-C-D (1 out of 1 paired bases are the same)

In this alignment, the number of identically paired bases is 1, so thehomology between the two sequences is ⅙ (16⅔%).

The maximum homology between these two sequences is thus 50%.

Block sequence: Refers to a symbolic representation of a sequence ofblocks. In its most general form a block sequence is a representativesequence in which no particular value, mathematical variable, or otherdesignation is assigned to each block of the sequence.

Incidence Matrix: As used herein is a well-defined term in the field ofDiscrete Mathematics. However, an incidence matrix cannot be definedwithout first defining a “graph”. In the method described herein asubset of general graphs called simple graphs is used. Members of thissubcategory are further defined as follows.

A simple graph G is a pair (V, E) where V represents the set of verticesof the simple graph and E is a set of un-oriented edges of the simplegraph. An edge is defined as a 2-component combination of members of theset of vertices. In other words, in a simple graph G there are somepairs of vertices that are connected by an edge. In our application agraph is based on nucleic acid sequences generated using sequencetemplates and vertices represent DNA sequences and edges represent arelative property of any pair of sequences.

The incidence matrix is a mathematical object that allows one todescribe any given graph. For the subset of simple graphs used herein,the simple graph G=(V, E), and for a pre-selected and fixed ordering ofvertices, V={v₁, v₂, . . . , v_(n)}, elements of the incidence matrixA(G)=[a_(ij)] are defined by the following rules:

-   -   (1) a_(ij)=1 for any pair of vertices {v_(i), v_(j)} that is a        member of the set of edges; and    -   (2) a_(ij)=0 for any pair of vertices {v_(i), v_(j)}that is not        a member of the set of edges.        This is an exact unequivocal definition of the incidence matrix.        In effect, one selects the indices: 1,2, . . . n of the vertices        and then forms an (n×n) square matrix with elements a_(ij)=1 if        the vertices v_(i) and v_(j) are connected by an edge and        a_(ij)=0 if the vertices v_(i) and v_(j) are not connected by an        edge.

To define the term “class property” as used herein, the term “completesimple graph” or “clique” must first be defined. The complete simplegraph is required because all sequences that result from the methoddescribed herein should collectively share the relative property of anypair of sequences defining an edge of graph G, for example not violatingthe threshold rule that is, do not have a “maximum simple homology”greater than a predetermined amount, whatever pair of the sequences arechosen from the final set. It is possible that additional “local” rules,based on known or empirically determined behavior of particularnucleotides, or nucleotide sequences, are applied to sequence pairs inaddition to the basic threshold rule.

In the language of a simple graph, G=(V, E), this means in the finalgraph there should be no pair of vertices (no sequence pair) notconnected by an edge (because an edge means that the sequencesrepresented by v_(i) and v_(j) do not violate the threshold rule).

Because the incidence matrix of any simple graph can be generated by theabove definition of its elements, the consequence of defining a simplecomplete graph is that the corresponding incidence matrix for a simplecomplete graph will have all off-diagonal elements equal to 1 and alldiagonal elements equal to 0. This is because if one aligns a sequencewith itself, the threshold rule is of course violated, and all othersequences are connected by an edge.

For any simple graph, there might be a complete subgraph. First, thedefinition of a subgraph of a graph is as follows. The subgraph Gs=(Vs,Es) of a simple graph G=(V, E) is a simple graph that contains thesubsets of vertices Vs of the set V of vertices and inclusion of the setVs into the set V is immersion (a mathematical term). This means thatone generates a subgraph Gs=(Vs, Es) of a simple graph G in two steps.First select some vertices Vs from G. Then select those edges Es from Gthat connect the chosen vertices and do not select edges that connectselected with non selected vertices.

We desire a subgraph of G that is a complete simple graph. By using thisproperty of the complete simple graph generated from the simple graph Gof all sequences generated by the template based algorithm, the pairwiseproperty of any pair of the sequences (violating/non-violating thethreshold rule) is converted into the property of all members of theset, termed “the class property”.

By selecting a subgraph of a simple graph G that is a complete simplegraph, this assures that, up to the tests involving the local rulesdescribed herein, there are no pairs of sequences in the resulting setthat violate the threshold rule, also described above, independent ofwhich pair of sequences in the set are chosen. This feature is calledthe “desired class property”.

The present invention thus includes reducing the potential for noncross-hybridization behavior by taking into account local homologies ofthe sequences and appears to have greater rigor than known approaches.For example, the method described herein involves the sliding of onesequence relative to the other sequence in order to form a sequencealignment that would accommodate insertions or deletions. (Kane et al.,Nucleic Acids Res.; 28, 4552-4557: 2000). TABLE I SEQ ID No Assigned inNO(1) Sequence Example 2(2) 1 GATTTGTATTGATTGAGATTAAAG 1 2TGATTGTAGTATGTATTGATAAAG 2 3 GATTGTAAGATTTGATAAAGTGTA 3 4GATTTGAAGATTATTGGTAATGTA 4 5 GATTGATTATTGTGATTTGAATTG 5 6GATTTGATTGTAAAAGATTGTTGA 6 7 ATTGGTAAATTGGTAAATGAATTG 7 8ATTGGATTTGATAAAGGTAAATGA 9 GTAAGTAATGAATGTAAAAGGATT 8 10GATTGATTGATTGATTGATTTGAT 11 TGATGATTAAAGAAAGTGATTGAT 12AAAGGATTTGATTGATAAAGTGAT 13 TGTAGATTTGTATGTATGTATGAT 10 14GATTTGATAAAGAAAGGATTGATT 15 GATTAAAGTGATTGATGATTTGTA 11 16AAAGAAAGAAAGAAAGAAAGTGTA 12 17 TGTAAAAGGATTGATTTGTATGTA 18AAAGTGTAGATTGATTAAAGAAAG 19 AAAGTTGATTGATTGAAAAGGTAT 20TTGATTGAGATTGATTTTGAGTAT 21 TGAATTGATGAATGAATGAAGTAT 15 22GTAATGAAGTATGTATGTAAGTAA 23 TGATGATTTGAATGAAGATTGATT 16 24TGATAAAGTGATAAAGGATTAAAG 17 25 TGATTTGAGTATTTGAGATTTTGA 18 26TGTAGTAAGATTGATTAAAGGTAA 27 GTATAAAGGATTGATTTTGAAAAG 28GTATTTGAGTAAGTAATTGATTGA 19 29 GTAAAAAGTTGAGTATTGAAAAAG 30GATTTGATAAAGGATTTGTATTGA 31 GATTGTATTGAAGTATTGTAAAAG 20 32TGATGATTTTGATGAAAAAGTTGA 33 TGATTTGAGATTAAAGAAAGGATT 21 34TGATTGAATTGAGTAAAAAGGATT 22 35 AAAGTGTAAAAGGATTTGATGTAT 36AAAGGTATTTGAGATTTGATTGAA 37 AAAGTTGAGATTTGAATGATTGAA 23 38TGTATTGAAAAGGTATGATTTGAA 39 GTATTGTATTGAAAAGGTAATTGA 24 40TTGAGTAATGATAAAGTGAAGATT 41 TGAAGATTTGAAGTAATTGAAAAG 25 42TGAAAAAGTGTAGATTTTGAGTAA 26 43 TGTATGAATGAAGATTTGATTGTA 44AAAGTTGAGTATTGATTTGAAAAG 27 45 GATTTGTAGATTTGTATTGAGATT 46AAAGAAAGGATTTGTAGTAAGATT 29 47 GTAAAAAGAAAGGTATAAAGGTAA 30 48GATTAAAGTTGATTGAAAAGTGAA 31 49 TGAAAAAGGTAATTGATGTATGAA 50AAAGGATTAAAGTGAAGTAATTGA 33 51 ATGAATTGGTATGTATATGAATGA 34 52TGAAATGAATGAATGATGAAATTG 35 53 ATTGATTGTGAATGAAATGAATTG 36 54ATTGAAAGATGAAAAGATGAAAAG 37 55 ATTGTTGAAAAGTGTAATGATTGA 38 56ATGATGTAATGAAAAGATTGTGTA 39 57 AAAGATTGAAAGATGATGTAATTG 58ATTGATGAGTATATTGTGTAGTAA 41 59 AAAGATTGTGTAATTGATGATGAA 60AAAGGTATATTGTGTAATGAGTAA 61 TGTAATGAGTATTGTAATTGAAAG 43 62GTATAAAGAAAGATTGGTAAATGA 44 63 TTGAGTAATTGAATTGTGAAATGA 45 64TGTATTGAATGAATTGTTGATGTA 46 65 TGTAATTGGTAAATGAGTAAAAAG 66TGAATGAAATTGATGAGTATAAAG 67 GTAAGTAAATTGAAAGATTGATGA 49 68GTAAATGATGATATTGGTATATTG 50 69 ATTGTTGATGATTGATTGAAATGA 51 70ATTGTGAAGTATAAAGATGATTGA 52 71 ATGAAAAGTTGAGTAAATTGTGAT 72ATGAATTGAAAGTGATTGAAAAAG 54 73 GTAAATTGATGAAAAGTTGATGAT 74AAAGTGATGTATATGAGTAAATTG 56 75 GTAATGATAAAGATGATGATATTG 57 76TTGAAAAGATTGGTAATGATATGA 77 AAAGTGAAAAAGATTGATTGATGA 59 78ATTGATGAGATTGATTATTGTGTA 79 ATGAGATTATTGGATTTGTAGATT 60 80TGAAGATTATGAATTGGTAAGATT 61 81 ATTGGATTATGAGATTATGATTGA 62 82ATTGTTGAATTGGATTAAAGATGA 83 AAAGATGAGTAAGTAAATTGGATT 84AAAGGTAAGATTATTGATGAAAAG 65 85 ATTGATGAGATTAAAGTTGAATTG 86GATTATTGGATTATGAAAAGGATT 87 GATTTGTAATTGTTGAGTAAATGA 67 88AAAGAAAGATTGTTGAGATTATGA 68 89 GTATAAAGGATTTTGAATTGATGA 90TTGAGATTGTAAATGAATTGTTGA 91 GTATATTGATTGTGTAATGAAAAG 92TGATATGAATTGGATTATTGGTAT 70 93 ATGAATGATGAATGATGATTATTG 94ATGAATTGATTGGATTGTAATGAT 71 95 GATTGTAATTGAGTAAATTGATGA 96GATTATTGGATTAAAGGTAAATGA 72 97 ATTGTTGAATTGATGAGATTTGAT 73 98GATTATGAGTAAATTGATTGTGAT 99 GATTATTGTTGATGAATGATATTG 100TGTAAAAGATTGAAAGGTATGATT 75 101 GTATTTAGATGAGTTTGTTAGATT 76 102TGAAGTTATGTAATAGAAAGTGAT 103 GTATGTATTGTATGTAGTTAATTG 77 104TGATATAGATAGTTAGATAGATAG 78 105 ATGATGATGTATTGTAGTTATGAA 79 106TTAGTGAATGTATTAGTTGATGTA 107 GTTAGTTAGATTATTGTTAGTTAG 80 108GTTAATTGTGTAGTTTGTTATTGA 109 GTTATGAAATAGTGATATTGTTAG 110ATTGTTAGAAAGTGTAGATTAAAG 81 111 ATGAGTATGTTATTAGTGTATGTA 82 112TGTAATAGTGAAGTTAGATTGTAT 83 113 ATTGATAGATGATTAGTTAGTTGA 84 114ATGAGTTTGTTTATGAGATTAAAG 115 TGATGTTTGATTATGATGTAGTAT 85 116ATGAGTTAGTTATGAATTAGATGA 117 ATTGTTAGTGATGTTAGTAATTAG 86 118TGATGTAAGTATTGATGTTAGTTT 87 119 GATTGTAAATAGAAAGTGAAGTAA 88 120ATTGTGTATGAAGTATTGTATGAT 121 ATAGTGATGTTATGAAGATTGTTA 122TTAGATGAATTGTGAAGTATTTAG 90 123 GTAAGTTATGATTGATGTTATGAA 91 124GTATTGATGTTTAAAGTGTAATAG 92 125 GATTGTAAGTAAGATTGTATATTG 126GTTTGTATTTAGATGAATAGAAAG 93 127 GTTTGATTTGTAATAGTGATTGTA 128TGTATGTAGTATTTAGAAAGATGA 129 ATGAATTGTGATAAAGAAAGTTAG 130TTAGTGTAGTAAGTTTAAAGTGTA 95 131 GTATGATTGTTTGTAATTAGTGAT 132GTTTAAAGTTAGTTGAGTTAGTAT 96 133 ATAGTGTATGTAGATTATGAGATT 97 134TTGAATGATTAGTTGAGTATGATT 98 135 GTATGTAAGTTAGTATGATTTGAA 136TGTAGTATATTGTTGAATTGTGAT 137 ATAGTGATTGTATGTATGATAAAG 138TTAGTGATTGATGTATATTGAAAG 139 GTAAGATTATGAGTTATGATGTAA 140GTTATGAAATTGTTAGTGTAGATT 99 141 GTTAGATTTGTAGTTTAAAGATAG 100 142TTAGTGATTGAAATGATGTAGATT 143 AAAGTGTAGTTATTAGTTAGTTAG 144AAAGAAAGTGTATGATGTTATTAG 145 GATTGTATATTGTGTATGATGATT 146TTGAGATTGTTATGATATGAGTAT 147 ATGAGTATGATTGTTATGATGTTT 148TGATTTAGTGAAATTGTGTATTAG 149 TGAATGTATGTAGTATGTTTGTTA 150GTTAGTATTGATGATTATGAGTTA 151 GTATATTGTGATTTAGTTGAGATT 152GTTAGTTTAAAGTTGAGATTGTTT 153 GTATATTGTTAGATGAGATTTGTA 154TGATGTATGTTAGTTTATGAATGA 155 TGTAGTATGTAATGTAGTATTTGA 156AAGAGTTATGTATTGAGTTAGTAT 157 TGTATGATGATTATAGTTGAGTAA 158ATTGATGAATGAGTTTGTATAAAG 159 TTGAGTTTATGATTAGAAAGAAAG 160TGATATTGATGAGTTAGTATTGAA 161 ATAGAAAGTGAAATGAGTATGTTA 162TTGATGTAGATTTGATGTATATAG 163 TTGAGATTATAGTGTAGTTTATAG 164TGATGTTAGATTGTTTGATTATTG 165 TGTATTAGATAGTGATTTGAATGA 166GATTATGATGAATGTAGTATGTAA 167 TGAATGATTGATATGAATAGTGTA 168GTAATGATTTAGTGTATTGAGTTT 169 TGTAGTAATGATTTGATGATAAAG 170TGAAGATTGTTATTAGTGATATTG 171 GTATTTGAATGATGTAATAGTGTA 172GTATATGATGTATTAGATTGAAAG 173 AAAGTTAGATTGAAAGTGATAAAG 174GTAAGATGTTGATATAGAAGATTA 9 175 TAATATGAGATGAAAGTGAATTAG 176TTAGTGAAGAAGTATAGTTTATTG 13 177 GTAGTTGAGAAGATAGTAATTAAT 178ATGAGATGATATTTGAGAAGTAAT 179 GATGTGAAGAAGATGAATATATAT 180AAAGTATAGTAAGATGTATAGTAG 14 181 GAAGTAATATGAGTAGTTGAATAT 182TTGATAATGTTTGTTTGTTTGTAG 28 183 TGAAGAAGAAAGTATAATGATGAA 184GTAGATTAGTTTGAAGTGAATAAT 32 185 TATAGTAGTGAAGATGATATATGA 186TATAATGAGTTGTTAGATATGTTG 187 GTTGTGAAATTAGATGTGAAATAT 188TAATGTTGTGAATAATGTAGAAAG 40 189 GTTTATAGTGAAATATGAAGATAG 42 190ATTATGAAGTAAGTTAATGAGAAG 47 191 GATGAAAGTAATGTTTATTGTGAA 192ATTATTGAGATGTGAAGTTTGTTT 48 193 TGTAGAAGATGAGATGTATAATTA 53 194TAATTTGAGTTGTGTATATAGTAG 195 TGATATTAGTAAGAAGTTGAATAG 196GTTAGTTATTGAGAAGTGTATATA 55 197 GTAGTAATGTTAATGAATTAGTAG 58 198GTTTGTTTGATGTGATTGAATAAT 199 GTAAGTAGTAATTTGAATATGTAG 64 200GTTTGAAGATATGTTTGAAGTATA 201 ATGATAATTGAAGATGTAATGTTG 202GTAGATAGTATAGTTGTAATGTTA 66 203 GATGTGAATGTAATATGTTTATAG 69 204TGAAATTAGTTTGTAAGATGTGTA 74 205 TGTAGTATAAAGTATATGAAGTAG 63 206ATATGTTGTTGAGTTGATAGTATA 89 207 ATTATTGAGTAGAAAGATAGAAAG 94 208GTTGTTGAATATTGAATATAGTTG 209 ATGAGAAGTTAGTAATGTAAATAG 210TGAAATGAGAAGATTAATGAGTTT(1)Oligonucleotides having SEQ ID NOs: 1 to 100 were used in experimentsof Example 1.(2)Oligonucleotides used in experiments of Example 2 are indicated inthis column by the numbers assigned to them in the experiments.

All references referred to in this specification are incorporated hereinby reference.

The scope of protection sought for the invention described herein isdefined by the appended claims. It will also be understood that anyelements recited above or in the claims, can be combined with theelements of any claim. In particular, elements of a dependent claim canbe combined with any element of a claim from which it depends, or withany other compatible element of the invention.

This application claims priority from U.S. Provisional PatentApplication Nos. 60/263,710 and 60/303,799, filed Jan. 25, 2001 and Jul.10, 2001. Both of these documents are incorporated herein by reference.

1. A composition comprising molecules for use as tags or tag complementswherein each molecule comprises an oligonucleotide selected from a setof oligonucleotides based on a following group of sequences: 1 4 6 6 1 32 4 5 5 2 3 1 8 1 2 3 4 1 7 1 9 8 4 1 1 9 2 6 9 1 2 4 3 9 6 9 8 9 8 10 99 1 2 3 8 10 8 8 7 4 3 1 1 1 1 1 1 2 2 1 3 3 2 2 3 1 2 2 3 2 4 1 4 4 4 21 2 3 3 1 1 1 3 2 2 1 4 3 3 3 3 3 4 4 3 1 1 4 4 3 4 1 1 3 3 3 6 6 6 3 56 6 1 1 6 5 7 6 7 7 7 5 8 7 5 5 8 8 2 1 7 7 1 1 2 3 2 3 1 3 2 6 5 6 1 64 8 1 1 3 8 5 3 1 1 6 3 5 6 8 8 6 6 8 3 6 5 7 3 1 2 3 1 4 6 1 5 7 5 4 32 1 6 7 3 6 2 6 1 3 3 1 2 7 6 8 3 1 3 4 3 1 2 5 3 5 6 1 2 7 3 6 1 7 2 74 6 3 5 1 7 5 4 6 3 8 6 6 8 2 3 7 1 7 1 7 8 6 3 7 3 4 1 6 8 4 7 7 1 2 43 6 5 2 6 3 1 4 1 4 6 1 3 3 1 4 8 1 8 3 3 5 3 8 1 3 6 6 3 7 7 3 8 6 4 73 1 3 7 8 6 10 9 5 5 10 10 7 10 10 10 7 9 9 9 7 7 10 9 9 3 10 3 10 3 9 63 4 10 6 10 4 10 3 9 4 3 9 3 10 4 9 9 10 5 9 4 8 3 9 4 9 10 7 3 5 9 4 108 4 10 5 4 9 3 5 3 3 9 8 10 6 8 6 9 7 10 4 6 10 9 6 4 4 9 8 10 8 3 7 7 910 5 3 8 8 9 3 9 10 8 10 2 9 5 9 9 6 2 2 7 10 9 7 5 3 10 6 10 3 6 8 9 210 9 3 2 7 3 8 9 10 3 6 2 3 2 5 10 8 9 8 2 3 10 2 9 6 3 9 8 2 10 3 7 3 99 10 9 10 1 1 9 4 10 1 9 1 4 1 7 1 10 9 8 1 9 1 10 1 10 6 9 6 9 1 3 10 310 8 8 9 1 3 8 1 9 10 3 9 10 1 3 6 9 1 9 1 10 3 1 1 4 9 6 8 10 3 3 9 6 110 5 3 1 6 9 10 6 1 8 10 9 6 5 9 9 4 10 3 2 10 9 1 9 5 10 10 7 2 1 9 109 9 1 8 2 1 8 6 8 9 10 1 9 1 3 8 10 9 6 9 10 1 2 1 10 8 9 9 2 1 9 6 7 29 4 3 9 3 5 1 5 11 10 14 12 1 7 12 4 13 3 2 5 5 4 4 12 9 2 13 13 11 1313 10 2 5 4 12 7 11 7 4 11 6 4 12 12 1 9 11 11 12 9 4 14 12 6 12 7 13 29 11 9 11 3 4 1 3 10 5 12 11 4 4 4 13 7 12 1 5 9 13 10 11 11 6 10 14 1410 1 3 2 14 1 10 4 5 10 12 12 7 11 10 9 11 2 12 8 11 2 8 5 2 12 14 1 813 3 7 8 9 4 7 5 4 2 13 2 12 7 1 12 11 10 9 7 5 11 8 12 2 2 12 7 5 2 143 4 13 1 8 8 1 5 9 14 5 11 10 13 3 14 1 4 13 2 4 4 4 5 11 3 10 10 9 2 33 11 11 4 8 14 3 4 5 1 14 8 11 2 14 3 11 6 12 5 13 4 4 1 10 1 6 10 11 65 1 5 8 12 5 1 7 4 5 9 6 9 2 13 2 4 4 2 3 11 2 2 5 9 3 8 1 10 12 2 8 127 9 11 4 1 12 1 4 14 3 13 11 2 7 10 4 1 3 4 12 11 11 11 3 3 4 2 12 11 15 9 4 2 1 6 1 12 2 10 5 10 5 1 12 2 14 2 11 7 9 4 11 7 4 4 5 14 12 12 52 1 10 12 5 9 2 11 6 1 12 14 3 6 1 14 5 9 11 10 1 4 2 5 12 14 10 10 4 58 4 5 6 10 12 4 6 12 5 4 2 1 13 6 8 9 10 10 14 5 3 6 14 10 11 3 3 2 9 1012 5 7 13 3 7 10 5 12 6 4 1 2 5 13 6 1 13 4 14 13 2 12 1 14 1 9 4 11 132 6 10 1 10 7 4 5 8 7 2 2 10 13 4 8 2 11 4 6 14 4 8 2 6 2 3 7 1 12 11 29 5 6 10 4 13 4 5 10 4 11 9 3 3 11 9 3 2 3 8 15 6 20 17 19 21 10 15 3 711 11 7 17 20 14 9 16 6 17 13 21 21 10 15 22 6 17 21 15 7 17 10 22 22 320 8 15 20 16 17 21 10 16 6 22 6 21 14 14 14 16 7 17 3 20 10 7 16 19 1417 7 21 20 16 7 15 22 10 20 10 18 11 22 18 18 7 19 15 7 22 21 18 7 21 163 14 13 7 22 17 13 19 7 8 12 10 17 15 3 21 14 9 7 19 6 15 7 14 14 4 1710 15 20 19 21 6 18 4 20 16 2 19 8 17 6 13 12 12 6 17 4 20 16 21 12 1019 16 14 14 15 2 7 21 8 16 21 6 22 16 14 17 22 14 17 20 10 21 7 15 21 1816 13 20 18 21 12 15 7 4 22 14 13 7 19 14 8 15 4 4 5 3 20 7 16 22 18 618 13 20 19 6 16 3 13 3 18 6 22 7 20 18 10 17 11 21 8 13 7 10 17 19 1014

wherein: (A) each of 1 to 22 is a 4mer selected from the group of 4mersconsisting of WWWW, WWWX, WWWY, WWXW, WWXX, WWXY, WWYW, WWYX, WWYY,WXWW, WXWX, WXWY, WXXW, WXXX, WXXY, WXYW, WXYX, WXYY, WYWW, WYWX, WYWY,WYXW, WYXX, WYXY, WYYW, WYYX, WYYY, XWWW, XWWX, XWWY, XWXW, XWXX, XWXY,XWYW, XWYX, XWYY, XXWW, XXWX, XXWY, XXXW, XXXX, XXXY, XXYW, XXYX, XXYY,XYWW, XYWX, XYWY, XYXW, XYXX, XYXY, XYYW, XYYX, XYYY, YWWW, YWWX, YWWY,YWXW, YWXX, YWXY, YWYW, YWYX, YWYY, YXWW, YXWX, YXWY, YXXW, YXXX, YXXY,YXYW, YXYX, YXYY, YYWW, YYWX, YYWY, YYXW, YYXX, YYXY, YYYW, YYYX, andYYYY, and (B) each of 1 to 22 is selected so as to be different from allof the others of 1 to 22; (C) each of W, X and Y is a base in which: (i)(a) W=one of A, T/U, G, and C, X=one of A, T/U, G, and C, Y=one of A,T/U, G, and C, and each of W, X and Y is selected so as to be differentfrom all of the others of W, X and Y, (b) an unselected said base of(i)(a) can be substituted any number of times for any one of W, X and Y,or (ii) (a) W=G or C, X=A or T/U, Y=A or T/U, and X≠Y, and (b) a basenot selected in (ii)(a) can be inserted into each sequence at one ormore locations, the location of each insertion being the same in all thesequences; (D) up to three bases can be inserted at any location of anyof the sequences or up to three bases can be deleted from any of thesequences; (E) all of the sequences of a said group of oligonucleotidesare read 5′ to 3′ or are read 3′ to 5′; and wherein each oligonucleotideof a said set has a sequence of at least ten contiguous bases of thesequence on which it is based, provided that: (F) (I) the quotient ofthe sum of G and C divided by the sum of A, T/U, G and C for allcombined sequences of the set is between about 0.1 and 0.40 and saidquotient for each sequence of the set does not vary from the quotientfor the combined sequences by more than 0.2; and (II) for any phantomsequence generated from any pair of first and second sequences of theset L₁ and L₂ in length, respectively, by selection from the first andsecond sequences of identical bases in identical sequence with eachother: (i) any consecutive sequence of bases in the phantom sequencewhich is identical to a consecutive sequence of bases in each of thefirst and second sequences from which it is generated is less than((¾×L)−1) bases in length; (ii) the phantom sequence, if greater than orequal to (⅚×L) in length, contains at least three insertions/deletionsor mismatches when compared to the first and second sequences from whichit is generated; and (iii) the phantom sequence is not greater than orequal to ({fraction (11/12)}×L) in length; where L=L₁, or if L₁≠L₂,where L is the greater of L₁ and L₂; and wherein any base present may besubstituted by an analogue thereof.
 2. The composition of claim 1,wherein the composition includes at least ten said molecules, or atleast eleven said molecules, or at least twelve said molecules, or atleast thirteen said molecules, or at least fourteen said molecules, orat least fifteen said molecules, or at least sixteen said molecules, orat least seventeen said molecules, or at least eighteen said molecules,or at least nineteen said molecules, or at least twenty said molecules,or at least twenty-one said molecules, or at least twenty-two saidmolecules, or at least twenty-three said molecules, or at leasttwenty-four said molecules, or at least twenty-five said molecules, orat least twenty-six said molecules, or at least twenty-seven saidmolecules, or at least twenty-eight said molecules, or at leasttwenty-nine said molecules, or at least thirty said molecules, or atleast thirty-one said molecules, or at least thirty-two said molecules,or at least thirty-three said molecules, or at least thirty-four saidmolecules, or at least thirty-five said molecules, or at leastthirty-six said molecules, or at least thirty-seven said molecules, orat least thirty-eight said molecules, or at least thirty-nine saidmolecules, or at least forty said molecules, or at least forty-one saidmolecules, or at least forty-two said molecules, or at least forty-threesaid molecules, or at least forty-four said molecules, or at leastforty-five said molecules, or at least forty-six said molecules, or atleast forty-seven said molecules, or at least forty-eight saidmolecules, or at least forty-nine said molecules, or at least fifty saidmolecules, or at least sixty said molecules, or at least seventy saidmolecules, or at least eighty said molecules, or at least ninety saidmolecules, or at least one hundred said molecules, or at least onehundred and ten said molecules, or at least one hundred and twenty saidmolecules, or at least one hundred and thirty said molecules, or atleast one hundred and forty said molecules, or at least one hundred andfifty said molecules, or at least one hundred and sixty said molecules,or at least one hundred and seventy said molecules, or at least onehundred and eighty said molecules, or at least one hundred and ninetysaid molecules, or at least two hundred said molecules.
 3. Thecomposition of claim 1, wherein said set of oligonucleotides is based onthe sequences tested in Example 2, as set out in Table I.
 4. Thecomposition of claim 3, wherein the composition includes at least tensaid molecules, or at least eleven said molecules, or at least twelvesaid molecules, or at least thirteen said molecules, or at leastfourteen said molecules, or at least fifteen said molecules, or at leastsixteen said molecules, or at least seventeen said molecules, or atleast eighteen said molecules, or at least nineteen said molecules, orat least twenty said molecules, or at least twenty-one said molecules,or at least twenty-two said molecules, or at least twenty-three saidmolecules, or at least twenty-four said molecules, or at leasttwenty-five said molecules, or at least twenty-six said molecules, or atleast twenty-seven said molecules, or at least twenty-eight saidmolecules, or at least twenty-nine said molecules, or at least thirtysaid molecules, or at least thirty-one said molecules, or at leastthirty-two said molecules, or at least thirty-three said molecules, orat least thirty-four said molecules, or at least thirty-five saidmolecules, or at least thirty-six said molecules, or at leastthirty-seven said molecules, or at least thirty-eight said molecules, orat least thirty-nine said molecules, or at least forty said molecules,or at least forty-one said molecules, or at least forty-two saidmolecules, or at least forty-three said molecules, or at leastforty-four said molecules, or at least forty-five said molecules, or atleast forty-six said molecules, or at least forty-seven said molecules,or at least forty-eight said molecules, or at least forty-nine saidmolecules, or at least fifty said molecules, or at least sixty saidmolecules, or at least seventy said molecules, or at least eighty saidmolecules, or at least ninety said molecules, or at least one hundredsaid molecules.
 5. The composition of claim 1, wherein: (G) for thegroup of 24mer sequences in which each 1=GATT, each 2=TGAT, each 3=AAAG,each 4=TGTA, each 5=GTAT, each 6=TTGA, each 7=TGAA, each 8=GTAA, each9=ATTG, each 10=ATGA, each 11=TTAG, each 12=GTTA, each 13=ATAG, each14=GTTT, each 15=GATG, each 16=GTAG, each 17=GAAG, each 18=GTTG, each19=ATTA, each 20=TATA, each 21=TAAT and each 22=ATAT, under a definedset of conditions in which the maximum degree of hybridization between asequence and any complement of a different sequence of the group of24mer sequences does not exceed 30% of the degree of hybridizationbetween said sequence and its complement, for all oligonucleotides ofthe set, the maximum degree of hybridization between an oligonucleotideand a complement of any other oligonucleotide of the set does not exceed50% of the degree of hybridization of the oligonucleotide and itscomplement.
 6. The composition of claim 3, wherein: (G) for the group of24mer sequences in which each 1=GATT, each 2=TGAT, each 3=AAAG, each4=TGTA, each 5=GTAT, each 6=TTGA, each 7=TGAA, each 8=GTAA, each 9=ATTG,each 10=ATGA, each 11=TTAG, each 12=GTTA, each 13=ATAG, each 14=GTTT,each 15=GATG, each 16=GTAG, each 17=GAAG, each 18=GTTG, each 19=ATTA,each 20=TATA, each 21=TAAT and each 22=ATAT, under a defined set ofconditions in which the maximum degree of hybridization between asequence and any complement of a different sequence of the group of24mer sequences does not exceed 30% of the degree of hybridizationbetween said sequence and its complement, for all oligonucleotides ofthe set, the maximum degree of hybridization between an oligonucleotideand a complement of any other oligonucleotide of the set does not exceed50% of the degree of hybridization of the oligonucleotide and itscomplement.
 7. The composition of claim 5 wherein, in (G) under saiddefined set of conditions in which the maximum degree of hybridizationbetween a sequence and any complement of a different sequence does notexceed 30% of the degree of hybridization between said sequence and itscomplement, the degree of hybridization between each sequence and itscomplement varies by a factor of between 1 and 10, more preferablybetween 1 and 9, and more preferably between 1 and
 8. 8. The compositionof claim 6 wherein, in (G) under said defined set of conditions in whichthe maximum degree of hybridization between a sequence and anycomplement of a different sequence does not exceed 30% of the degree ofhybridization between said sequence and its complement, the degree ofhybridization between each sequence and its complement varies by afactor of between 1 and 10, more preferably between 1 and 9, and morepreferably between 1 and
 8. 9. The composition of claim 7 wherein themaximum degree of hybridization in (G) between a sequence and anycomplement of a different sequence does not exceed 25%, more preferablywherein the maximum degree of hybridization in (G) between a sequenceand any complement of a different sequence does not exceed 20%, morepreferably wherein the maximum degree of hybridization in (G) between asequence and any complement of a different sequence does not exceed 15%,more preferably wherein the maximum degree of hybridization in (G)between a sequence and any complement of a different sequence does notexceed 11%.
 10. The composition of claim 8 wherein the maximum degree ofhybridization in (G) between a sequence and any complement of adifferent sequence does not exceed 25%, more preferably wherein themaximum degree of hybridization in (G) between a sequence and anycomplement of a different sequence does not exceed 20%, more preferablywherein the maximum degree of hybridization in (G) between a sequenceand any complement of a different sequence does not exceed 15%, morepreferably wherein the maximum degree of hybridization in (G) between asequence and any complement of a different sequence does not exceed 11%.11. The composition of claim 7 wherein under said defined set ofconditions of (G), the maximum degree of hybridization between asequence and a complement of any other sequence of the set is no morethan 15% greater than the maximum degree of hybridization between asequence and any complement of a different sequence of the said group of24mer sequences, more preferably no more than 10% greater, morepreferably no more than 5% greater.
 12. The composition of claim 8wherein under said defined set of conditions of (G), the maximum degreeof hybridization between a sequence and a complement of any othersequence of the set is no more than 15% greater than the maximum degreeof hybridization between a sequence and any complement of a differentsequence of the said group of 24mer sequences, more preferably no morethan 10% greater, more preferably no more than 5% greater.
 13. Thecomposition of claim 9 wherein under said defined set of conditions of(G), the maximum degree of hybridization between a sequence and acomplement of any other sequence of the set is no more than 15% greaterthan the maximum degree of hybridization between a sequence and anycomplement of a different sequence of the said group of 24mer sequences,more preferably no more than 10% greater, more preferably no more than5% greater.
 14. The composition of claim 10 wherein under said definedset of conditions of (G), the maximum degree of hybridization between asequence and a complement of any other sequence of the set is no morethan 15% greater than the maximum degree of hybridization between asequence and any complement of a different sequence of the said group of24mer sequences, more preferably no more than 10% greater, morepreferably no more than 5% greater.
 15. The composition of claim 5,wherein said defined set of conditions results in a level ofhybridization that is the same as the level of hybridization obtainedwhen hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08%Triton X-100, pH 8.0 at 37° C.
 16. The composition of claim 6, whereinsaid defined set of conditions results in a level of hybridization thatis the same as the level of hybridization obtained when hybridizationconditions include 0.2 M NaCl, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 at37° C.
 17. The composition of claim 15 wherein, in (G) said defined setof conditions includes the group of 24mer sequences of (G) beingcovalently linked to beads.
 18. The composition of claim 16 wherein, in(G) said defined set of conditions includes the group of 24mer sequencesof (G) being covalently linked to beads.
 19. The composition of claim 17or 18 wherein, in (G) for the group of 24mers the maximum degree ofhybridization between a sequence and any complement of a differentsequence does not exceed 15% of the degree of hybridization between saidsequence and its complement and the degree of hybridization between eachsequence and its complement varies by a factor of between 1 and 9, andfor all oligonucleotides of the set, the maximum degree of hybridizationbetween an oligonucleotide and a complement of any other oligonucleotideof the set does not exceed 20% of the degree of hybridization of theoligonucleotide and its complement.
 20. The composition of claim 1 orclaim 3, wherein each of the 4mers represented by numerals 1 to 22 isselected from the group of 4mers consisting of WXXX, WXXY, WXYX, WXYY,WYXX, WYXY, WYYX, WYYY, XWXX, XWXY, XWYX, XWYY, XXWX, XXWY, XXXW, XXYW,XYWX, XYWY, XYXW, XYYW, YWXX, YWXY, YWYX, YWYY, YXWX, YXWY, YXXW, YXYW,YYWX, YYWY, YYXW, and YYYW.
 21. The composition of claim 20, whereineach of the 4mers represented by numeral 1 are identical to each other,each of the 4mers represented by numeral 2 are identical to each other,each of the 4mers represented by numeral 3 are identical to each other,each of the 4mers represented by numeral 4 are identical to each other,each of the 4mers represented by numeral 5 are identical to each other,each of the 4mers represented by numeral 6 are identical to each other,each of the 4mers represented by numeral 7 are identical to each other,each of the 4mers represented by numeral 8 are identical to each other,each of the 4mers represented by numeral 9 are identical to each other,each of the 4mers represented by numeral 10 are identical to each other,each of the 4mers represented by numeral 11 are identical to each other,each of the 4mers represented by numeral 12 are identical to each other,each of the 4mers represented by numeral 13 are identical to each other,each of the 4mers represented by numeral 14 are identical to each other,each of the 4mers represented by numeral 15 are identical to each other,each of the 4mers represented by numeral 16 are identical to each other,each of the 4mers represented by numeral 17 are identical to each other,each of the 4mers represented by numeral 18 are identical to each other,each of the 4mers represented by numeral 19 are identical to each other,each of the 4mers represented by numeral 20 are identical to each other,each of the 4mers represented by numeral 21 are identical to each other,and each of the 4mers represented by numeral 22 are identical to eachother.
 22. The composition of claim 20, wherein at least one of the4mers represented by the numeral 1 has the sequence WXYY, at least oneof the 4mers represented by the numeral 2 has the sequence YWXY, atleast one of the 4mers represented by the numeral 3 has the sequenceXXXW, at least one of the 4mers represented by the numeral 4 has thesequence YWYX, at least one of the 4mers represented by the numeral 5has the sequence WYXY, at least one of the 4mers represented by thenumeral 6 has the sequence YYWX, at least one of the 4mers representedby the numeral 7 has the sequence YWXX, at least one of the 4mersrepresented by the numeral 8 has the sequence WYXX, at least one of the4mers represented by the numeral 9 has the sequence XYYW, at least oneof the 4mers represented by the numeral 10 has the sequence XYWX, atleast one of the 4mers represented by the numeral 11 has the sequenceYYXW, at least one of the 4mers represented by the numeral 12 has thesequence WYYX, at least one of the 4mers represented by the numeral 13has the sequence XYXW, at least one of the 4mers represented by thenumeral 14 has the sequence WYYY, at least one of the 4mers representedby the numeral 15 has the sequence WXYW, at least one of the 4mersrepresented by the numeral 16 has the sequence WYXW, at least one of the4mers represented by the numeral 17 has the sequence WXXW, at least oneof the 4mers represented by the numeral 18 has the sequence WYYW, atleast one of the 4mers represented by the numeral 19 has the sequenceXYYX, at least one of the 4mers represented by the numeral 20 has thesequence YXYX, at least one of the 4mers represented by the numeral 21has the sequence YXXY, and at least one of the 4mers represented by thenumeral 22 has the sequence XYXY.
 23. The composition of claim 22,wherein in each 1=WXYY, each 2=YWXY, each 3=XXXW, each 4=YWYX, each5=WYXY, each 6=YYWX, each 7=YWXX, each 8=WYXX, each 9=XYYW, each10=XYWX, each 11=YYXW, each 12=WYYX, each 13=XYXW, each 14=WYYY, each15=WXYW, each 16=WYXW, each 17=WXXW, each 18=WYYW, each 19=XYYX, each20=YXYX, each 21=YXXY and each 22=XYXY.
 24. The composition of claim 1,wherein a said group of sequences is based on the sequences havingsequence identifiers 1 to 173 as set out in Table IA, and wherein eachof the 4mers represented by numerals 1 to 14 in (A) is selected from thegroup of 4mers consisting of WXYY, YWXY, XXXW, YWYX, WYXY, YYWX, YWXX,WYXX, XYYW, XYWX, YYXW, WYYX, XYXW, and WYYY.
 25. The composition ofclaim 24, wherein the composition includes at least ten said molecules,or at least eleven said molecules, or at least twelve said molecules, orat least thirteen said molecules, or at least fourteen said molecules,or at least fifteen said molecules, or at least sixteen said molecules,or at least seventeen said molecules, or at least eighteen saidmolecules, or at least nineteen said molecules, or at least twenty saidmolecules, or at least twenty-one said molecules, or at least twenty-twosaid molecules, or at least twenty-three said molecules, or at leasttwenty-four said molecules, or at least twenty-five said molecules, orat least twenty-six said molecules, or at least twenty-seven saidmolecules, or at least twenty-eight said molecules, or at leasttwenty-nine said molecules, or at least thirty said molecules, or atleast thirty-one said molecules, or at least thirty-two said molecules,or at least thirty-three said molecules, or at least thirty-four saidmolecules, or at least thirty-five said molecules, or at leastthirty-six said molecules, or at least thirty-seven said molecules, orat least thirty-eight said molecules, or at least thirty-nine saidmolecules, or at least forty said molecules, or at least forty-one saidmolecules, or at least forty-two said molecules, or at least forty-threesaid molecules, or at least forty-four said molecules, or at leastforty-five said molecules, or at least forty-six said molecules, or atleast forty-seven said molecules, or at least forty-eight saidmolecules, or at least forty-nine said molecules, or at least fifty saidmolecules, or at least sixty said molecules, or at least seventy saidmolecules, or at least eighty said molecules, or at least ninety saidmolecules, or at least one hundred said molecules, or at least onehundred and ten said molecules, or at least one hundred and twenty saidmolecules, or at least one hundred and thirty said molecules, or atleast one hundred and forty said molecules, or at least one hundred andfifty said molecules, or at least one hundred and sixty said molecules,or at least one hundred and seventy said molecules, or at least onehundred and eighty said molecules, or at least one hundred and ninetysaid molecules, or at least two hundred said molecules.
 26. Thecomposition of claim 24, wherein: (G) for the group of 24mer sequencesin which each 1=GATT, each 2=TGAT, each 3=AAAG, each 4=TGTA, each5=GTAT, each 6=TTGA, each 7=TGAA, each 8=GTAA, each 9=ATTG, each10=ATGA, each 11=TTAG, each 12=GTTA, each 13=ATAG, each 14=GTTT, under adefined set of conditions in which the maximum degree of hybridizationbetween a sequence and any complement of a different sequence of thegroup of 24mer sequences does not exceed 30% of the degree ofhybridization between said sequence and its complement, for alloligonucleotides of the set, the maximum degree of hybridization betweenan oligonucleotide and a complement of any other oligonucleotide of theset does not exceed 50% of the degree of hybridization of theoligonucleotide and its complement.
 27. The composition of claim 26wherein, in (G) under said defined set of conditions in which themaximum degree of hybridization between a sequence and any complement ofa different sequence does not exceed 30% of the degree of hybridizationbetween said sequence and its complement, the degree of hybridizationbetween each sequence and its complement varies by a factor of between 1and 10, more preferably between 1 and 9, and more preferably between 1and
 8. 28. The composition of claim 27 wherein the maximum degree ofhybridization in (G) between a sequence and any complement of adifferent sequence does not exceed 25%, more preferably wherein themaximum degree of hybridization in (G) between a sequence and anycomplement of a different sequence does not exceed 20%, more preferablywherein the maximum degree of hybridization in (G) between a sequenceand any complement of a different sequence does not exceed 15%, morepreferably wherein the maximum degree of hybridization in (G) between asequence and any complement of a different sequence does not exceed 11%.29. The composition of claim 27 wherein under said defined set ofconditions of (G), the maximum degree of hybridization between asequence and a complement of any other sequence of the set is no morethan 15% greater than the maximum degree of hybridization between asequence and any complement of a different sequence of the said group of24mer sequences, more preferably no more than 10% greater, morepreferably no more than 5% greater.
 30. The composition of claim 26,wherein said defined set of conditions results in a level ofhybridization that is the same as the level of hybridization obtainedwhen hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08%Triton X-100, pH 8.0 at 37° C.
 31. The composition of claim 30 wherein,in (G) said defined set of conditions includes the group of 24mersequences of (G) being covalently linked to beads.
 32. The compositionof claim 24, wherein each of the 4mers represented by numeral 1 areidentical to each other, each of the 4mers represented by numeral 2 areidentical to each other, each of the 4mers represented by numeral 3 areidentical to each other, each of the 4mers represented by numeral 4 areidentical to each other, each of the 4mers represented by numeral 5 areidentical to each other, each of the 4mers represented by numeral 6 areidentical to each other, each of the 4mers represented by numeral 7 areidentical to each other, each of the 4mers represented by numeral 8 areidentical to each other, each of the 4mers represented by numeral 9 areidentical to each other, each of the 4mers represented by numeral 10 areidentical to each other, each of the 4mers represented by numeral 11 areidentical to each other, each of the 4mers represented by numeral 12 areidentical to each other, each of the 4mers represented by numeral 13 areidentical to each other, and each of the 4mers represented by numeral 14are identical to each other.
 33. The composition of claim 24, wherein atleast one of the 4mers represented by the numeral 1 has the sequenceWXYY, at least one of the 4mers represented by the numeral 2 has thesequence YWXY, at least one of the 4mers represented by the numeral 3has the sequence XXXW, at least one of the 4mers represented by thenumeral 4 has the sequence YWYX, at least one of the 4mers representedby the numeral 5 has the sequence WYXY, at least one of the 4mersrepresented by the numeral 6 has the sequence YYWX, at least one of the4mers represented by the numeral 7 has the sequence YWXX, at least oneof the 4mers represented by the numeral 8 has the sequence WYXX, atleast one of the 4mers represented by the numeral 9 has the sequenceXYYW, at least one of the 4mers represented by the numeral 10 has thesequence XYWX, at least one of the 4mers represented by the numeral 11has the sequence YYXW, at least one of the 4mers represented by thenumeral 12 has the sequence WYYX, at least one of the 4mers representedby the numeral 13 has the sequence XYXW, and at least one of the 4mersrepresented by the numeral 14 has the sequence WYYY.
 34. The compositionof claim 33, wherein each 1=WXYY, each 2=YWXY, each 3=XXXW, each 4=YWYX,each 5=WYXY, each 6=YYWX, each 7=YWXX, each 8=WYXX, each 9=XYYW, each10=XYWX, each 11=YYXW, each 12=WYYX, each 13=XYXW, and each 14=WYYY. 35.The composition of claim 1, wherein a said group of sequences is basedon those sequences having sequence identifiers 1 to 100 as set out inTable IA and wherein each of the 4mers represented by numerals 1 to 10in (A) is selected from the group of 4mers consisting of WXYY, YWXY,XXXW, YWYX, WYXY, YYWX, YWXX, WYXX, XYYW, and XYWX.
 36. The compositionof claim 35, wherein the composition includes at least ten saidmolecules, or at least eleven said molecules, or at least twelve saidmolecules, or at least thirteen said molecules, or at least fourteensaid molecules, or at least fifteen said molecules, or at least sixteensaid molecules, or at least seventeen said molecules, or at leasteighteen said molecules, or at least nineteen said molecules, or atleast twenty said molecules, or at least twenty-one said molecules, orat least twenty-two said molecules, or at least twenty-three saidmolecules, or at least twenty-four said molecules, or at leasttwenty-five said molecules, or at least twenty-six said molecules, or atleast twenty-seven said molecules, or at least twenty-eight saidmolecules, or at least twenty-nine said molecules, or at least thirtysaid molecules, or at least thirty-one said molecules, or at leastthirty-two said molecules, or at least thirty-three said molecules, orat least thirty-four said molecules, or at least thirty-five saidmolecules, or at least thirty-six said molecules, or at leastthirty-seven said molecules, or at least thirty-eight said molecules, orat least thirty-nine said molecules, or at least forty said molecules,or at least forty-one said molecules, or at least forty-two saidmolecules, or at least forty-three said molecules, or at leastforty-four said molecules, or at least forty-five said molecules, or atleast forty-six said molecules, or at least forty-seven said molecules,or at least forty-eight said molecules, or at least forty-nine saidmolecules, or at least fifty said molecules, or at least sixty saidmolecules, or at least seventy said molecules, or at least eighty saidmolecules, or at least ninety said molecules, or at least one hundredsaid molecules.
 37. The composition of claim 35, wherein: (G) for thegroup of 24mer sequences in which each 1=GATT, each 2=TGAT, each 3=AAAG,each 4=TGTA, each 5=GTAT, each 6=TTGA, each 7=TGAA, each 8=GTAA, each9=ATTG, each 10=ATGA, under a defined set of conditions in which themaximum degree of hybridization between a sequence and any complement ofa different sequence of the group of 24mer sequences does not exceed 30%of the degree of hybridization between said sequence and its complement,for all oligonucleotides of the set, the maximum degree of hybridizationbetween an oligonucleotide and a complement of any other oligonucleotideof the set does not exceed 50% of the degree of hybridization of theoligonucleotide and its complement.
 38. The composition of claim 37wherein, in (G) under said defined set of conditions in which themaximum degree of hybridization between a sequence and any complement ofa different sequence does not exceed 30% of the degree of hybridizationbetween said sequence and its complement, the degree of hybridizationbetween each sequence and its complement varies by a factor of between 1and 10, more preferably between 1 and 9, and more preferably between 1and
 8. 39. The composition of claim 38 wherein the maximum degree ofhybridization in (G) between a sequence and any complement of adifferent sequence does not exceed 25%, more preferably wherein themaximum degree of hybridization in (G) between a sequence and anycomplement of a different sequence does not exceed 20%, more preferablywherein the maximum degree of hybridization in (G) between a sequenceand any complement of a different sequence does not exceed 15%, morepreferably wherein the maximum degree of hybridization in (G) between asequence and any complement of a different sequence does not exceed 11%.40. The composition of claim 38 or claim 39 wherein under said definedset of conditions of (G), the maximum degree of hybridization between asequence and a complement of any other sequence of the set is no morethan 15% greater than the maximum degree of hybridization between asequence and any complement of a different sequence of the said group of24mer sequences, more preferably no more than 10% greater, morepreferably no more than 5% greater.
 41. The composition of claim 37,wherein said defined set of conditions results in a level ofhybridization that is the same as the level of hybridization obtainedwhen hybridization conditions include 0.2 M NaCl, 0.1 M Tris, 0.08%Triton X-100, pH 8.0 at 37° C.
 42. The composition of claim 41 wherein,in (G) said defined set of conditions includes the group of 24mersequences of (G) being covalently linked to beads.
 43. The compositionof claim 35, wherein each of the 4mers represented by numeral 1 areidentical to each other, each of the 4mers represented by numeral 2 areidentical to each other, each of the 4mers represented by numeral 3 areidentical to each other, each of the 4mers represented by numeral 4 areidentical to each other, each of the 4mers represented by numeral 5 areidentical to each other, each of the 4mers represented by numeral 6 areidentical to each other, each of the 4mers represented by numeral 7 areidentical to each other, each of the 4mers represented by numeral 8 areidentical to each other, each of the 4mers represented by numeral 9 areidentical to each other, and each of the 4mers represented by numeral 10are identical to each other.
 44. The composition of claim 43, wherein atleast one of the 4mers represented by the numeral 1 has the sequenceWXYY, at least one of the 4mers represented by the numeral 2 has thesequence YWXY, at least one of the 4mers represented by the numeral 3has the sequence XXXW, at least one of the 4mers represented by thenumeral 4 has the sequence YWYX, at least one of the 4mers representedby the numeral 5 has the sequence WYXY, at least one of the 4mersrepresented by the numeral 6 has the sequence YYWX, at least one of the4mers represented by the numeral 7 has the sequence YWXX, at least oneof the 4mers represented by the numeral 8 has the sequence WYXX, atleast one of the 4mers represented by the numeral 9 has the sequenceXYYW, and at least one of the 4mers represented by the numeral 10 hasthe sequence XYWX.
 45. The composition of claim 44, wherein each 1=WXYY,each 2=YWXY, each 3=XXXW, each 4=YWYX, each 5=WYXY, each 6=YYWX, each7=YWXX, each 8 WYXX, each 9=XYYW, and each 10=XYWX.
 46. The compositionof any of claims 1 or 3 wherein in (C)(i)(a): W=one of G and C; X=one ofA and T/U; and Y=one of A and T/U.
 47. The composition of claim 46,wherein in (C)(i)(a): W=G; X=one of A, and T/U; and Y=one of A and T/U.48. The composition of claim 47, wherein W=G; X=A; and Y=T/U.
 49. Thecomposition of claims 1 or 3, wherein in (F)(I), said quotient for eachsequence of the set does not vary from the quotient for the combinedsequences by more than 0.1.
 50. The composition of claim 49, wherein in(F)(I), said quotient for each sequence of the set does not vary fromthe quotient for the combined sequences by more than 0.05.
 51. Thecomposition of claim 50, wherein in (F)(I), said quotient for eachsequence of the set does not vary from the quotient for the combinedsequences by more than 0.01.
 52. The composition of claims 1 or 3,wherein in (F)(I) the quotient of the sum of G and C divided by the sumof A, T/U, G and C for all combined sequences of the set is betweenabout 0.15 and 0.35.
 53. The composition of claim 52, wherein in (F)(I)the quotient of the sum of G and C divided by the sum of A, T/U, G and Cfor all combined sequences of the set is between about 0.2 and 0.3. 54.The composition of claim 53, wherein in (F)(I) the quotient of the sumof G and C divided by the sum of A, T/U, G and C for all combinedsequences of the set is between about 0.21 and 0.29.
 55. The compositionof claim 54, wherein in (F)(I) the quotient of the sum of G and Cdivided by the sum of A, T/U, G and C for all combined sequences of theset is between about 0.22 and 0.28.
 56. The composition of claim 55,wherein in (F)(I) the quotient of the sum of G and C divided by the sumof A, T/U, G and C for all combined sequences of the set is betweenabout 0.23 and 0.27.
 57. The composition of claim 56, wherein in (F)(I)the quotient of the sum of G and C divided by the sum of A, T/U, G and Cfor all combined sequences of the set is between about 0.24 and 0.26.58. The composition of claim 57, wherein in (F)(I) the quotient of thesum of G and C divided by the sum of A, T/U, G and C for all combinedsequences of the set is 0.25.
 59. The composition of claims 1 or 3,wherein in (D) up to two bases can be inserted at any location of any ofthe sequences or up to two bases can be deleted from any of thesequences.
 60. The composition of claim 59, wherein in (D) one base canbe inserted at any location of any of the sequences or one base can bedeleted from any of the sequences.
 61. The composition of claim 60,wherein in (D) no base can be inserted at any location of any of thesequences.
 62. The composition of claim 60, wherein in (D) no base canbe deleted from any of the sequences.
 63. The composition of claim 60,wherein in (D) no base can be inserted at or deleted from any locationof any of the sequences.
 64. The composition of claims 1 or 3, whereineach of the oligonucleotides of a said set has a sequence at leasteleven contiguous bases of the sequence on which it is based; or whereineach of the oligonucleotides of a said set has a sequence at leasttwelve contiguous bases of the sequence on which it is based; or whereineach of the oligonucleotides of a said set has a sequence at leastthirteen contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast fourteen contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast fifteen contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast sixteen contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast seventeen contiguous bases of the sequence on which it is based;or wherein each of the oligonucleotides of a said set has a sequence atleast eighteen contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast nineteen contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast twenty contiguous bases of the sequence on which it is based; orwherein each of the oligonucleotides of a said set has a sequence atleast twenty-one contiguous bases of the sequence on which it is based;or wherein each of the oligonucleotides of a said set has a sequence atleast twenty-two contiguous bases of the sequence on which it is based;or wherein each of the oligonucleotides of a said set has a sequence atleast twenty-three contiguous bases of the sequence on which it isbased; or wherein each of the oligonucleotides of a said set has asequence at least twenty-four contiguous bases of the sequence on whichit is based.
 65. The composition of claims 1 or 3, wherein in eacholigonucleotide of the set, there is a maximum of six bases other than Gbetween every neighboring pair of G's.
 66. The composition according toclaim 65, wherein each initial G of an oligonucleotide of the setsequence occupies a position in the terminal selected from a first,second, third, fourth, fifth, sixth or seventh position thereof.
 67. Thecomposition of claims 1 or 3, wherein the contiguous bases of eacholigonucleotide of a said set are selected such that the position of thefirst base of each said oligonucleotide within the sequence on which itis based is the same for all nucleotides of the set.
 68. The compositionof claims 1 or 3, wherein each of the oligonucleotides of a said set isup to thirty bases in length; or each of the oligonucleotides of a saidset is up to twenty-nine bases in length; or each of theoligonucleotides of a said set is up to twenty-eight bases in length; oreach of the oligonucleotides of a said set is up to twenty-seven basesin length; or each of the oligonucleotides of a said set is up totwenty-six bases in length; or each of the oligonucleotides of a saidset is up to twenty-five bases in length; or each of theoligonucleotides of a said set is up to twenty-four bases in length. 69.The composition of claims 1 or 3, wherein each of the oligonucleotidesof a said set has a length of within five bases of the average length ofall of the oligonucleotides in the set; or each of the oligonucleotidesof a said set has a length of within four bases of the average length ofall of the oligonucleotides in the set; or each of the oligonucleotidesof a said set has a length of within three bases of the average lengthof all of the oligonucleotides in the set; or each of theoligonucleotides of a said set has a length of within two bases of theaverage length of all of the oligonucleotides in the set; or each of theoligonucleotides of a said set has a length of within one base of theaverage length of all of the oligonucleotides in the set.
 70. Thecomposition of claims 1 or 3, wherein in (II)(i), any consecutivesequence of bases in the phantom sequence which is identical to aconsecutive sequence of bases in each of the first and second sequencesfrom which it is generated is no more ((⅔×L)−1) bases in length.
 71. Thecomposition of claims 1 or 3, wherein in (II)(ii), the phantom sequence,if greater than or equal to (¾×L) in length, contains at least 3insertions/deletions or mismatches when compared to the first and secondsequences from which it is generated.
 72. The composition of claim 71,wherein in (II)(ii), the phantom sequence, if greater than or equal to(⅔×L) in length, contains at least 3 insertions/deletions or mismatcheswhen compared to the first and second sequences from which it isgenerated.
 73. The composition of claims 1 or 3, wherein in (II)(iii),the phantom sequence is not greater than or equal to (⅚×L) in length.74. The composition of claim 73, wherein in (II)(iii), the phantomsequence is not greater than or equal to (¾×L) in length.
 75. Acomposition comprising molecules for use as tags or tag complementswherein each molecule comprises an oligonucleotide selected from a setof oligonucleotides based on a following group of sequences having theone hundred sequence identifiers of the sequences tested in Example 2 asset out in Table I, wherein: (A) wherein 1=WXYY, each 2=YWXY, each3=XXXW, each 4=YWYX, each 5=WYXY, each 6=YYWX, each 7=YWXX, each 8=WYXX,each 9=XYYW, each 10=XYWX, each 11=YYXW, each 12=WYYX, each 13=XYXW,each 14=WYYY, each 15=WXYW, each 16=WYXW, each 17=WXXW, each 18=WYYW,each 19=XYYX, each 20=YXYX, each 21=YXXY and each 22=XYXY; (B) each ofW, X and Y is a base in which either: (i) (a) W=one of A, T/U, G, and C,X=one of A, T/U, G, and C, Y=one of A, T/U, G, and C, and each of W, Xand Y is selected so as to be different from all of the others of W, Xand Y, (b) an unselected said base of (i)(a) can be substituted anynumber of times for any one of W, X and Y, or (ii) (a) W=G or C, X=A orT/U, Y=A or T/U, and X # Y, and (b) a base not selected in (ii)(a) canbe inserted into each sequence at one or more locations, the location ofeach insertion being the same in all the sequences; (C) up to threebases can be inserted at any location of any of the sequences or up tothree bases can be deleted from any of the sequences; (D) all of thesequences of a said group of oligonucleotides are read 5′ to 3′ or areread 3′ to 5′; and wherein each oligonucleotide of a said set has asequence of at least ten contiguous bases of the sequence on which it isbased, provided that: (E) the quotient of the sum of G and C divided bythe sum of A, T/U, G and C for all combined sequences of the set isbetween about 0.1 and 0.40 and said quotient for each sequence of theset does not vary from the quotient for the combined sequences by morethan 0.2; and (F) for the group of 24mer sequences in which each 1=GATT,each 2=TGAT, each 3=AAAG, each 4=TGTA, each 5=GTAT, each 6=TTGA, each7=TGAA, each 8=GTAA, each 9=ATTG, each 10=ATGA, each 11=TTAG, each12=GTTA, each 13=ATAG, each 14=GTTT, each 15=GATG, each 16=GTAG, each17=GAAG, each 18=GTTG, each 19=ATTA, each 20=TATA, each 21=TAAT and each22=ATAT, under a defined set of conditions in which the maximum degreeof hybridization between a sequence and any complement of a differentsequence of the group of 24mer sequences does not exceed 30% of thedegree of hybridization between said sequence and its complement, forall oligonucleotides of the set, the maximum degree of hybridizationbetween an oligonucleotide and a complement of any other oligonucleotideof the set does not exceed 50% of the degree of hybridization of theoligonucleotide and its complement; wherein any base present may besubstituted by an analogue thereof.
 76. The composition of claim 75wherein the contiguous bases of each oligonucleotide of a said set areselected such that the position of the first base of each saidoligonucleotide within the sequence on which it is based is the same forall nucleotides of the set.
 77. The composition of claim 75 wherein,subject to the provisos of (E) and (F), each oligonucleotide of a saidset comprises a said sequence of twenty-four contiguous bases of thesequence on which it is based.
 78. The composition of claim 75 wherein,subject to the proviso of (F) each oligonucleotide of a said setcomprises a said sequence of twenty-four contiguous bases of thesequence on which it is based.
 79. The composition of claim 75, whereinin (B): W=one of G and C; X=one of A and T/U; and Y=one of A and T/U.80. The composition of claim 79, wherein in (B): W=G; X=one of A, andT/U; and Y=one of A and T/U.
 81. The composition of claim 75, whereinthe composition includes at least ten said molecules, or at least elevensaid molecules, or at least twelve said molecules, or at least thirteensaid molecules, or at least fourteen said molecules, or at least fifteensaid molecules, or at least sixteen said molecules, or at leastseventeen said molecules, or at least eighteen said molecules, or atleast nineteen said molecules, or at least twenty said molecules, or atleast twenty-one said molecules, or at least twenty-two said molecules,or at least twenty-three said molecules, or at least twenty-four saidmolecules, or at least twenty-five said molecules, or at leasttwenty-six said molecules, or at least twenty-seven said molecules, orat least twenty-eight said molecules, or at least twenty-nine saidmolecules, or at least thirty said molecules, or at least thirty-onesaid molecules, or at least thirty-two said molecules, or at leastthirty-three said molecules, or at least thirty-four said molecules, orat least thirty-five said molecules, or at least thirty-six saidmolecules, or at least thirty-seven said molecules, or at leastthirty-eight said molecules, or at least thirty-nine said molecules, orat least forty said molecules, or at least forty-one said molecules, orat least forty-two said molecules, or at least forty-three saidmolecules, or at least forty-four said molecules, or at least forty-fivesaid molecules, or at least forty-six said molecules, or at leastforty-seven said molecules, or at least forty-eight said molecules, orat least forty-nine said molecules, or at least fifty said molecules, orat least sixty said molecules, or at least seventy said molecules, or atleast eighty said molecules, or at least ninety said molecules, or atleast one hundred said molecules.
 82. A composition of claims 1, 3, or75, wherein each molecule is linked to a solid phase support so as to bedistinguishable from a mixture of said molecules by hybridization to itscomplement.
 83. The composition of claim 82, wherein each molecule islinked to a defined location on a said solid phase support, the definedlocation for each said molecule being different than the definedlocation for different other said molecules.
 84. The composition ofclaim 82, wherein each said solid phase support is a microparticle andeach said molecule is covalently to a different microparticle than eachother different said molecule.
 85. A composition according to any ofclaims 1, 3, or 75, wherein each said molecule comprises a tagcomplement.
 86. A kit for sorting and identifying polynucleotides, thekit comprising one or more solid phase supports each having one or morespatially discrete regions, each such region having a uniform populationof substantially identical tag complements covalently attached, and thetag complements each being selected from the set of oligonucleotides asdefined in any of claims 1 to
 85. 87. A kit according to claim 86,wherein there is a tag complement for each said oligonucleotide of asaid composition.
 88. A kit according to claim 86 wherein said one ormore solid phase supports is a planar substrate and wherein said one ormore spatially discrete regions is a plurality of spatially addressableregions.
 89. A kit according to claim 86 wherein said one or more solidphase supports is a plurality of microparticles.
 90. A kit according toclaim 89 wherein said microparticles each have a diameter in the rangeof from 5 to 40 μm.
 91. A kit according to claim 89 or 90, wherein eachmicroparticle is spectrophotometrically unique from each othermicroparticle having a different oligonucleotide attached thereto.
 92. Amethod of analyzing a biological sample comprising a biological sequencefor the presence of a mutation or polymorphism at a locus of the nucleicacid, the method comprising: (A) amplifying the nucleic acid molecule inthe presence of a first primer having a 5′-sequence having the sequenceof a tag complementary to the sequence of a tag complement belonging toa family of tag complements as defined in claim 85 to form an amplifiedmolecule with a 5′-end with a sequence complementary to the sequence ofthe tag; (B) extending the amplified molecule in the presence of apolymerase and a second primer having 5′-end complementary the 3′-end ofthe amplified sequence, with the 3′-end of the second primer extendingto immediately adjacent said locus, in the presence of a plurality ofnucleoside triphosphate derivatives each of which is: (i) capable ofincorporation during transciption by the polymerase onto the 3′-end of agrowing nucleotide strand; (ii) causes termination of polymerization;and (iii) capable of differential detection, one from the other, whereinthere is a said derivative complementary to each possible nucleotidepresent at said locus of the amplified sequence; (C) specificallyhybridizing the second primer to a tag complement having the tagcomplement sequence of (A); and (D) detecting the nucleotide derivativeincorporated into the second primer in (B) so as to identify the baselocated at the locus of the nucleic acid.
 93. A method of analyzing abiological sample comprising a plurality of nucleic acid molecules forthe presence of a mutation or polymorphism at a locus of each nucleicacid molecule, for each nucleic acid molecule, the method comprising:(A) amplifying the nucleic acid molecule in the presence of a firstprimer having a 5′-sequence having the sequence of a tag complementaryto the sequence of a tag complement belonging to a family of tagcomplements as defined in claim 85 to form an amplified molecule with a5′-end with a sequence complementary to the sequence of the tag; (B)extending the amplified molecule in the presence of a polymerase and asecond primer having 5′-end complementary the 3′-end of the amplifiedsequence, the 3′-end of the second primer extending to immediatelyadjacent said locus, in the presence of a plurality of nucleosidetriphosphate derivatives each of which is: (i) capable of incorporationduring transciption by the polymerase onto the 3′-end of a growingnucleotide strand; (ii) causes termination of polymerization; and (iii)capable of differential detection, one from the other, wherein there isa said derivative complementary to each possible nucleotide present atsaid locus of the amplified molecule; (C) specifically hybridizing thesecond primer to a tag complement having the tag complement sequence of(A); and (D) detecting the nucleotide derivative incorporated into thesecond primer in (B) so as to identify the base located at the locus ofthe nucleic acid; wherein each tag of (A) is unique for each nucleicacid molecule and steps (A) and (B) are carried out with said nucleicmolecules in the presence of each other.
 94. A method of analyzing abiological sample comprising a plurality of double strandedcomplementary nucleic acid molecules for the presence of a mutation orpolymorphism at a locus of each nucleic acid molecule, for each nucleicacid molecule, the method comprising: (A) Amplifying the double strandedmolecule in the presence of a pair of first primers, each primer havingan identical 5′-sequence having the sequence of a tag complementary tothe sequence of a tag complement belonging to a family of tagcomplements as defined in claim 85 to form amplified molecules with5′-ends with a sequence complementary to the sequence of the tag; (B)Extending the amplified molecules in the presence of a polymerase and apair of second primers each second primer having a 5′-end complementarya 3′-end of the amplified sequence, the 3′-end of each said secondprimer extending to immediately adjacent said locus, in the presence ofa plurality of nucleoside triphosphate derivatives each of which is: (i)capable of incorporation during transciption by the polymerase onto the3′-end of a growing nucleotide strand; (ii) causes termination ofpolymerization; and (iii) capable of differential detection, one fromthe other; (C) Specifically hybridizing each of the second primers to atag complement having the tag complement sequence of (A); and (D)Detecting the nucleotide derivative incorporated into the second primersin (B) so as to identify the base located at said locus; wherein thesequence of each tag of (A) is unique for each nucleic acid molecule andsteps (A) and (B) are carried out with said nucleic molecules in thepresence of each other.
 95. A method of analyzing a biological samplecomprising a plurality of nucleic acid molecules for the presence of amutation or polymorphism at a locus of each nucleic acid molecule, foreach nucleic acid molecule, the method comprising: (a) hybridizing themolecule and a primer, the primer having a 5′-sequence having thesequence of a tag complementary to the sequence of a tag complementbelonging to a family of tag complements as defined in claim 85 and a3′-end extending to immediately adjacent the locus; (b) enzymaticallyextending the 3′-end of the primer in the presence of a plurality ofnucleoside triphosphate derivatives each of which is: (i) capable ofenzymatic incorporation onto the 3′-end of a growing nucleotide strand;(ii) causes termination of said extension; and (iii) capable ofdifferential detection, one from the other, wherein there is a saidderivative complementary to each possible nucleotide present at saidlocus; (c) specifically hybridizing the extended primer formed in step(b) to a tag complement having the tag complement sequence of (a); and(d) detecting the nucleotide derivative incorporated into the primer instep (b) so as to identify the base located at the locus of the nucleicacid molecule; wherein each tag of (a) is unique for each nucleic acidmolecule and steps (a) and (b) are carried out with said nucleicmolecules in the presence of each other.
 96. The method of claim 93wherein each said derivative is a dideoxy nucleoside triphosphate. 97.The method of claim 95, wherein each respective complement is attachedas a uniform population of substantially identical complements in aspacially discrete region on one or more said solid phase supports. 98.The method of claim 97, each said tag complement comprises a label, eachsuch label being different for respective complements, and step (d)includes detecting the presence of the different labels for respectivehybridization complexes of bound tags and tag complements.
 99. Thehybridized molecule and primer of step (A) of claim
 95. 100. A method ofdetermining the presence of a target suspected of being contained in amixture, the method comprising the steps of: (i) labelling the targetwith a first label; (ii) providing a first detection moiety capable ofspecific binding to the target and including a first tag; (iii) exposinga sample of the mixture to the detection moiety under conditionssuitable to permit (or cause) said specific binding of the molecule andtarget; (iv) providing a family of tag complements as defined in claim85 wherein the family contains a first tag complement having a sequencecomplementary to that of the first tag; (v) exposing the sample to thefamily of tag complements under conditions suitable to permit (or cause)specific hybridization of the first tag and its tag complement; (vi)determining whether a said first detection moiety hybridized to a firstsaid tag complement is bound to a said labelled target in order todetermine the presence or absence said target in the mixture.
 101. Themethod of claim 100 wherein said first tag complement is linked to asolid support at a specific location of the support and step (vi)includes detecting the presence the first label at said specifiedlocation.
 102. The method of claim 100 wherein said first tag complementcomprises a second label and step (vi) includes detecting the presenceof the first and second labels in a hybridized complex of the moiety andthe first tag complement.
 103. The method of claim 100 wherein saidtarget is selected from the group consisting of organic molecules,antigens, proteins, polypeptides, antibodies and nucleic acids.
 104. Themethod of claim 103, wherein said target is an antigen and said firstmolecule is an antibody specific for said antigen.
 105. The method ofclaim 104, wherein the antigen is a polypeptide or protein and thelabelling step includes conjugation of fluorescent molecules,digoxigenin, biotinylation and the like.
 106. The method of claim 105,wherein said target is a nucleic acid and the labelling step includesincorporation of fluorescent molecules, radiolabelled nucleotide,digoxigenin, biotinylation and the like.
 107. A set of moleculesconsisting essentially of two or more of the following-listed sequencesfor use as tags or tag complements: GATTTGTATTGATTGAGATTAAAG (SEQ IDNO: 1) TGATTGTAGTATGTATTGATAAAG (SEQ ID NO: 2) GATTGTAAGATTTGATAAAGTGTA(SEQ ID NO: 3) GATTTGAAGATTATTGGTAATGTA (SEQ ID NO: 4)GATTGATTATTGTGATTTGAATTG (SEQ ID NO: 5) GATTTGATTGTAAAAGATTGTTGA (SEQ IDNO: 6) ATTGGTAAATTGGTAAATGAATTG (SEQ ID NO: 7) ATTGGATTTGATAAAGGTAAATGA(SEQ ID NO: 8) GTAAGTAATGAATGTAAAAGGATT (SEQ ID NO: 9)GATTGATTGATTGATTGATTTGAT (SEQ ID NO: 10) TGATGATTAAAGAAAGTGATTGAT (SEQID NO: 11) AAAGGATTTGATTGATAAAGTGAT (SEQ ID NO: 12)TGTAGATTTGTATGTATGTATGAT (SEQ ID NO: 13) GATTTGATAAAGAAAGGATTGATT (SEQID NO: 14) GATTAAAGTGATTGATGATTTGTA (SEQ ID NO: 15)AAAGAAAGAAAGAAAGAAAGTGTA (SEQ ID NO: 16) TGTAAAAGGATTGATTTGTATGTA (SEQID NO: 17) AAAGTGTAGATTGATTAAAGAAAG (SEQ ID NO: 18)AAAGTTGATTGATTGAAAAGGTAT (SEQ ID NO: 19) TTGATTGAGATTGATTTTGAGTAT (SEQID NO: 20) TGAATTGATGAATGAATGAAGTAT (SEQ ID NO: 21)GTAATGAAGTATGTATGTAAGTAA (SEQ ID NO: 22) TGATGATTTGAATGAAGATTGATT (SEQID NO: 23) TGATAAAGTGATAAAGGATTAAAG (SEQ ID NO: 24)TGATTTGAGTATTTGAGATTTTGA (SEQ ID NO: 25) TGTAGTAAGATTGATTAAAGGTAA (SEQID NO: 26) GTATAAAGGATTGATTTTGAAAAG (SEQ ID NO: 27)GTATTTGAGTAAGTAATTGATTGA (SEQ ID NO: 28) GTAAAAAGTTGAGTATTGAAAAAG (SEQID NO: 29) GATTTGATAAAGGATTTGTATTGA (SEQ ID NO: 30)GATTGTATTGAAGTATTGTAAAAG (SEQ ID NO: 31) TGATGATTTTGATGAAAAAGTTGA (SEQID NO: 32) TGATTTGAGATTAAAGAAAGGATT (SEQ ID NO: 33)TGATTGAATTGAGTAAAAAGGATT (SEQ ID NO: 34) AAAGTGTAAAAGGATTTGATGTAT (SEQID NO: 35) AAAGGTATTTGAGATTTGATTGAA (SEQ ID NO: 36)AAAGTTGAGATTTGAATGATTGAA (SEQ ID NO: 37) TGTATTGAAAAGGTATGATTTGAA (SEQID NO: 38) GTATTGTATTGAAAAGGTAATTGA (SEQ ID NO: 39)TTGAGTAATGATAAAGTGAAGATT (SEQ ID NO: 40) TGAAGATTTGAAGTAATTGAAAAG (SEQID NO: 41) TGAAAAAGTGTAGATTTTGAGTAA (SEQ ID NO: 42)TGTATGAATGAAGATTTGATTGTA (SEQ ID NO: 43) AAAGTTGAGTATTGATTTGAAAAG (SEQID NO: 44) GATTTGTAGATTTGTATTGAGATT (SEQ ID NO: 45)AAAGAAAGGATTTGTAGTAAGATT (SEQ ID NO: 46) GTAAAAAGAAAGGTATAAAGGTAA (SEQID NO: 47) GATTAAAGTTGATTGAAAAGTGAA (SEQ ID NO: 48)TGAAAAAGGTAATTGATGTATGAA (SEQ ID NO: 49) AAAGGATTAAAGTGAAGTAATTGA (SEQID NO: 50) ATGAATTGGTATGTATATGAATGA (SEQ ID NO: 51)TGAAATGAATGAATGATGAAATTG (SEQ ID NO: 52) ATTGATTGTGAATGAAATGAATTG (SEQID NO: 53) ATTGAAAGATGAAAAGATGAAAAG (SEQ ID NO: 54)ATTGTTGAAAAGTGTAATGATTGA (SEQ ID NO: 55) ATGATGTAATGAAAAGATTGTGTA (SEQID NO: 56) AAAGATTGAAAGATGATGTAATTG (SEQ ID NO: 57)ATTGATGAGTATATTGTGTAGTAA (SEQ ID NO: 58) AAAGATFFGTGTAATTGATGATGAA (SEQID NO: 59) AAAGGTATATTGTGTAATGAGTAA (SEQ ID NO: 60)TGTAATGAGTATTGTAATTGAAAG (SEQ ID NO: 61) GTATAAAGAAAGATTGGTAAATGA (SEQID NO: 62) TTGAGTAATTGAATTGTGAAATGA (SEQ ID NO: 63)TGTATTGAATGAATTGTTGATGTA (SEQ ID NO: 64) TGTAATTGGTAAATGAGTAAAAAG (SEQID NO: 65) TGAATGAAATTGATGAGTATAAAG (SEQ ID NO: 66)GTAAGTAAATTGAAAGATTGATGA (SEQ ID NO: 67) GTAAATGATGATATTGGTATATTG (SEQID NO: 68) ATTGTTGATGATTGATTGAAATGA (SEQ ID NO: 69)ATTGTGAAGTATAAAGATGATTGA (SEQ ID NO: 70) ATGAAAAGTTGAGTAAATTGTGAT (SEQID NO: 71) ATGAATTGAAAGTGATTGAAAAAG (SEQ ID NO: 72)GTAAATTGATGAAAAGTTGATGAT (SEQ ID NO: 73) AAAGTGATGTATATGAGTAAATTG (SEQID NO: 74) GTAATGATAAAGATGATGATATTG (SEQ ID NO: 75)TTGAAAAGATTGGTAATGATATGA (SEQ ID NO: 76) AAAGTGAAAAAGATTGATTGATGA (SEQID NO: 77) ATTGATGAGATTGATTATTGTGTA (SEQ ID NO: 78)ATGAGATTATTGGATTTGTAGATT (SEQ ID NO: 79) TGAAGATTATGAATTGGTAAGATT (SEQID NO: 80) ATTGGATTATGAGATTATGATTGA (SEQ ID NO: 81)ATTGTTGAATTGGATTAAAGATGA (SEQ ID NO: 82) AAAGATGAGTAAGTAAATTGGATT (SEQID NO: 83) AAAGGTAAGATTATTGATGAAAAG (SEQ ID NO: 84)ATTGATGAGATTAAAGTTGAATTG (SEQ ID NO: 85) GATTATTGGATTATGAAAAGGATT (SEQID NO: 86) GATTTGTAATTGTTGAGTAAATGA (SEQ ID NO: 87)AAAGAAAGATTGTTGAGATTATGA (SEQ ID NO: 88) GTATAAAGGATTTTGAATTGATGA (SEQID NO: 89) TTGAGATTGTAAATGAATTGTTGA (SEQ ID NO: 90)GTATATTGATTGTGTAATGAAAAG (SEQ ID NO: 91) TGATATGAATTGGATTATTGGTAT (SEQID NO: 92) ATGAATGATGAATGATGATTATTG (SEQ ID NO: 93)ATGAATTGATTGGATTGTAATGAT (SEQ ID NO: 94) GATTGTAATTGAGTAAATTGATGA (SEQID NO: 95) GATTATTGGATTAAAGGTAAATGA (SEQ ID NO: 96)ATTGTTGAATTGATGAGATTTGAT (SEQ ID NO: 97) GATTATGAGTAAATTGATTGTGAT (SEQID NO: 98) GATTATTGTTGATGAATGATATTG (SEQ ID NO: 99)TGTAAAAGATTGAAAGGTATGATT (SEQ ID NO: 100)


108. The set of molecules of claim 107, wherein the set includes atleast ten said sequences, or at least eleven said sequences, or at leasttwelve said sequences, or at least thirteen said sequences, or at leastfourteen said sequences, or at least fifteen said sequences, or at leastsixteen said sequences, or at least seventeen said sequences, or atleast eighteen said sequences, or at least nineteen said sequences, orat least twenty said sequences, or at least twenty-one said sequences,or at least twenty-two said sequences, or at least twenty-three saidsequences, or at least twenty-four said sequences, or at leasttwenty-five said sequences, or at least twenty-six said sequences, or atleast twenty-seven said sequences, or at least twenty-eight saidsequences, or at least twenty-nine said sequences, or at least thirtysaid sequences, or at least thirty-one said sequences, or at leastthirty-two said sequences, or at least thirty-three said sequences, orat least thirty-four said sequences, or at least thirty-five saidsequences, or at least thirty-six said sequences, or at leastthirty-seven said sequences, or at least thirty-eight said sequences, orat least thirty-nine said sequences, or at least forty said sequences,or at least forty-one said sequences, or at least forty-two saidsequences, or at least forty-three said sequences, or at leastforty-four said sequences, or at least forty-five said sequences, or atleast forty-six said sequences, or at least forty-seven said sequences,or at least forty-eight said sequences, or at least forty-nine saidsequences, or at least fifty said sequences, or at least sixty saidsequences, or at least seventy said sequences, or at least eighty saidsequences, or at least ninety said sequences, or the one hundred saidsequences.
 109. The set of claim 107 wherein, under a defined set ofconditions in which the maximum degree of hybridization between amolecule having a first said sequence and a molecule having anycomplement of a different second said sequence does not exceed 30% ofthe degree of hybridization between a molecule having the first sequenceand a molecule having its complement, for all molecules of the set, themaximum degree of hybridization between a molecule and a complement ofany other molecule of the set does not exceed 50% of the degree ofhybridization of the molecule and its complement.
 110. The set of claim109 wherein, under said defined set of conditions, the degree ofhybridization between each sequence and its complement varies by afactor of between 1 and 10, more preferably between 1 and 9, and morepreferably between 1 and
 8. 111. The set of claim 110 wherein saidmaximum degree of hybridization between a said molecule having a firstsaid sequence and a said molecule having any complement of a differentsecond said sequence does not exceed 25%, more preferably 20%, morepreferably 15%, more preferably 11%.
 112. The composition of claim 110wherein under said defined set of conditions, the maximum degree ofhybridization between a molecule and a complement of any other moleculeof the set is no more than 15% greater than the maximum degree ofhybridization between a molecule and any complement of a molecule of theset, more preferably no more than 10% greater, more preferably no morethan 5% greater.
 113. The composition of claim 109, wherein said definedset of conditions results in a degree of hybridization that is the sameas the degree of hybridization obtained when hybridization conditionsinclude 0.2 M NaCl, 0.1 M Tris, 0.08% Triton X-100, pH 8.0 at 37° C.